In the BeatAML cohort we found 28 variants of interest, of which 25 variants can be validated by RNASeq data.

The samples of interest are those where the variant has been called either in DNA or RNA, and have RNASeq data available for validation. The splice junction search is made based on the STAR-SJCounts 1-based intronic positions.

Splicing alterations to be evaluated:

1 NRAS chr1,114716123,C,T

Variant found in 19 patients of the BeatAML (20 samples).

  • Patients with NRAS chr1,114716123,C,T variant: 19 patients (20 samples)
  • Patients with the variant and RNASeq for validation: 19 patients (17 samples)

The splicing alterations being assessed are:

  • Donor Gain: chr1:114716126, found in the mutated samples.
  • Exon 2 Skipping: chr1:114713979-114716657, found in the mutated samples.
  • Donor Gain: chr1:114713979-114716076, found in the splice junction collection, but not found in the mutated samples.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"NRAS_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="NRAS" & found_variants$MutationKey_Hg38 == "chr1,114716123,C,T",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

1.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

1.1.1 Donor Gain

Search: chr1:114716126

Show all the splice junctions containing the position 114716126

colnames(GeneSJ)[grep("11471612",colnames(GeneSJ))]
## [1] "chr1_114713979_114716126"

Found: chr1:114713979-114716126

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr1_114713979_114716126
##   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [75] 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [260] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [371] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [408] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [445] 0 0 0 0 0 0 0 0 0 0 0 0 0

Samples with the SJ of interest:

table(GeneSJ$chr1_114713979_114716126>0) 
## 
## FALSE  TRUE 
##   455     2

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr1_114713979_114716126 > 0])
## 
## MUT  WT 
##   1   1

Alternative SJ found in the mutated samples.

1.1.2 Exon Skipping 2

Search: chr1:114713979-114716657

Show all the splice junctions containing the position 114713979

colnames(GeneSJ)[grep("114713979",colnames(GeneSJ))]
##  [1] "chr1_114713979_114714441" "chr1_114713979_114715360"
##  [3] "chr1_114713979_114715936" "chr1_114713979_114715946"
##  [5] "chr1_114713979_114715953" "chr1_114713979_114716015"
##  [7] "chr1_114713979_114716049" "chr1_114713979_114716076"
##  [9] "chr1_114713979_114716126" "chr1_114713979_114716152"
## [11] "chr1_114713979_114716657"

Found: chr1:114713979-114716657

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr1_114713979_114716657
##   [1]   0  14   0   4   7   0   0   1   3   0   0   0   8  31   0  15   0   0
##  [19]  12   0  48   1   4  15  68   7   7   1  10   2   1   3   4   0   1   8
##  [37]   1   6   4  42   0   0   3   1   3  21   0   5   0   2   1  33   3   0
##  [55]   3   5   7  28   5   0   4  28   0  16   2   9   0   5   0   0   0  10
##  [73]   7   2   9   0   4   1  50   0   1   0 110   7   9   0   8   0   5   2
##  [91]   5   4  17  26  55   1   0   1   0   0   5   0   1  12   0   0   6  11
## [109]   0   0   1   0   4   0  19   1   1   0   2   6   3  19   0   2   0  20
## [127]  21   0   3   4  13   0   1   4   8   0   0   8   4   4   3   3  16  33
## [145]   0  67   3 138   0   1   0  19   3   0   0   2   3  14   3   5   0  14
## [163]   0   4   0   0  28   2   0   2  33   4  15  12   1  11   2   0   0   1
## [181]   0   0   0   2  12   0   0   0   0   1   3  15 100   2   0   8   0   0
## [199]   1   1   2  45   5  23   3   0   2   4   2   6   4   8   9   1  44   5
## [217]  38   0   4   6   8  14   3   1   1 147   2   4   0   1   0  10  12   7
## [235]   4   0   5   0   0   2   0  34   5   0   8  74   6   0   7   3   3  14
## [253]   0   0   0   2  43   1   6   0   0   0   3   0   1  45   1   3   4   0
## [271]  27   5   2   0   4   0   0   0   1   1   3   1   8  59   2   4  33   5
## [289]   4   4   1   8  92  12   0   1   7   0   2   1   2   2   8   2   2   0
## [307]   5   4   1   7   0   0   3   2   2   0   6   1   7  11   2   0  56   1
## [325]   5   0   0   2   8   3   8  13  17   3  18   4   0   0   6   1   2   6
## [343]   3   4   0   0   7   1   3   2  79  10   4   0  10   8  46   3  16  10
## [361]   2   6   0   1  34   1   0  10  63   1   1   3   2   8  10   5   1   7
## [379]   0   1   0   6   3   6   4   0   4   1   0   4   5   0   1   0   2   0
## [397]   4   0   0   0   0   5   0   0   2   3   1   5   8   0   0   9   0   5
## [415]   2   5   5   1   0  16  13   0  10   8   0   7  48   1   2   0   0   0
## [433]   0   4  12   6   1   9   0   7   6   2   0   7   6   0   0  12   1   0
## [451]   1   2   2   0  14   1   0

Samples with the SJ of interest:

table(GeneSJ$chr1_114713979_114716657>0) 
## 
## FALSE  TRUE 
##   135   322

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr1_114713979_114716657 > 0])
## 
## MUT  WT 
##  12 310

Alternative SJ found in the mutated samples.

1.1.3 Donor Gain 2

Search: chr1:114713979-114716076

Show all the splice junctions containing the positions between 114716070 - 114716079

colnames(GeneSJ)[grep("11471607",colnames(GeneSJ))]
## [1] "chr1_114713979_114716076"

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr1_114713979_114716076
##   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [260] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [371] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [408] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [445] 0 0 0 0 0 0 0 0 0 0 0 0 0

Samples with the SJ of interest:

table(GeneSJ$chr1_114713979_114716076 >0) 
## 
## FALSE  TRUE 
##   456     1

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr1_114713979_114716076 > 0])
## 
## WT 
##  1

Alternative SJ not found in the mutated samples of the splice junction collection.

1.1.4 Canonical SJ

Exon1-2: chr1:114716178-114716657

Exon2-3: chr1:114713979-114716049

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr1_114716178_114716657
##   [1]  60 249  71   7  68  75 108 101  69  49  99  74 182 219 111  94  96  95
##  [19] 172  78 133  88  65 441 138 179  64   6  95 116  92 135  83  69  47 186
##  [37]  74 142 131 182 112  76 157 118  80 222   7 111  58 123 146  99  67  66
##  [55] 188 115  83 271  66 137 109  58  51 139  82 128  75  79  90  74 115  95
##  [73] 165 107 140  73 235 150 133 110  68  46 131  85 125  52 128 140  90  75
##  [91] 109 152 180 162  89  81 119 133  64 120  80  61 110  87 104 165 178 120
## [109]  36  55 158  59 111  98 104 125 107 113  53 170  71 179 121  91 136 165
## [127] 194  64 137 115  87  56  71  99  94  71  74 180  90  92 125 110 225  83
## [145]  95  80  31 163 120  92  83 531 132  96 111 121 153 116  77  94  85 294
## [163]  47  84  98  61 107  62  58 162 160  93 143 216  49 232 150  81  84 327
## [181] 102 204  78 108 133 148  62  42 113  41  54  67 111  62  84  95  96 109
## [199]  79  66 100 119 143 255  56 119 122 127 100 139  87 194  69  91 241 155
## [217] 139  93  96 251  92 228  70 241  70 263  79 112  88  61 103 180 134  26
## [235]  70  57  68 119  72  69  90 129  87  60  60 170 185   2  77  97 118 115
## [253]  70  86  97 134  92 159 138  66  95  75 100 101  54 246 111  92 101 120
## [271] 427 110 123  67  78  56  37  83 128  69 107 112 136 172  67 148  95 103
## [289]  29 112  85  93 275 145 121 118  57 134 129  28 155 211 107  60  69  83
## [307]  85 142  79 236  85  21 113  62 181  72 206  52 171 114  53  66  98  99
## [325]  70  82  37  65  96  22 105 132 229  80 155 107 109 117  96 113  90 132
## [343]  41 138 102  67 145  80 116 118 102  98 239  62 136 114 153 210 111  99
## [361]  87  90 126  87 123  78  87 126  79 147 120 103  90 128 132  70  78 139
## [379] 125 189   0  73 136 168  74  93 133 123 117  89 126 100  69  19 136 124
## [397]  90  53 104 101 115  55 107  70 127 143  68 167 137  35 149  93  57 146
## [415] 164 197 103  40  59 135  69 107 163 250  64 154 189  58  83 128  62  46
## [433]   0  72  54 306 106 140 139 152 128 225 180  90 190  54  24 127 130 107
## [451]  56 126 109 162 143  48 122

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr1_114713979_114716049
##   [1]  98 156 122   0 124 195 223 216  62 127 117 109 158 475 173 153  91 204
##  [19] 175  83 174 194  58 449 133 195  86   2  94 137 228 148  76  76  96 149
##  [37]  73 309 108 287 339  85 184 112  89 292  51  67 141 103 130 133  56 136
##  [55] 294 264  72 298  69  95 168  96  58 245  91 143 212 119  80  64 185  77
##  [73] 137 150 146 118 414 300 142 168 177  69 222 116 142 137 238 158 113 100
##  [91] 309 308 328 229 139 204 132 104  80 269 137 154 113  70 163 189 195  79
## [109]  89  81 227 107 152 179  69 184 316 125  93 114 120 240 109 169 288 209
## [127] 176  84 220  89 122  76  67 277  93  65  87 213 109  70  99 124 355  79
## [145]  87 162  46 202 210 108 121 510 219 116 175 174 203 159  82 183  97 302
## [163]  42 142 208  70 187  72  52 127 154 122 285 145  95 205 285 155  64 284
## [181] 170 162 176 105 245 217 114  84 103  77 129 163 189  48 134 111  85 115
## [199] 162  68  99 100 303 289  74 106 258 180 163 144 164 219  51 179 248 154
## [217] 105 136  76 233  73 239 117 264 129 251  96  98  99 117 100 321 200  38
## [235] 161 123  70 206  97 132 131 127 142 181  98 300 197   1 106 120 112 137
## [253]  88 149  92 152  96 165 194  49 101  41  81 165  85 166 186 117  85 133
## [271] 352  93 118 161  97  78 105  72 190 121 165  97 126 287  78 141  73 195
## [289]  71  94 123 378 216 118  95 143 184 124 120 163 177 279 164 102 106 142
## [307]  77 117  64 125 184  35 188  51 197  65 146  99 215 162  91 130 122  94
## [325] 116 160 149  90 177  48 123 113 341 100 197 136 174 156 135  87  79 182
## [343] 177 208  77 156 137 159 234  94 128 209 275 143 119 142 146 167 154 151
## [361] 224 140 254  93  94  85  97 160 212 170 125 205 108 103 280 134 155 247
## [379] 194 129   0  63  83 177 148 177 192 113 135  42 160 197 131   7  96  86
## [397]  75  85 101 167 116 132  90  72 135 274  46 292 286  72 149  80  52  86
## [415] 124 214 231 106 135 247 134 132 280 179 100 257 138 151 102 188 132  59
## [433]   2 133 110 285 119 172 124 164 110 160 206 102 213  60  21 113 143 247
## [451] 118 167 214 129 265 118 269

1.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonEx1_2 <- (GeneSJ$chr1_114716178_114716657)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_CanonEx2_3 <- (GeneSJ$chr1_114713979_114716049)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_DG <- (GeneSJ$chr1_114713979_114716126)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_ES2 <- (GeneSJ$chr1_114713979_114716657)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

1.3 VAF

Mutated samples vaf:

1.4 Plots

1.4.1 Static Dot Plots

Canonical SJ:

Splicing alterations:

1.4.2 Interactive Dot Plots

Canonical SJ:

Splicing alteration:

1.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

1.5 Statistical Analysis

SJCounts <- GeneSJ #BeatAML.NRAS.chr1-114716123-C-T.xlsx

1.5.1 Donor Gain

1.5.1.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_DG[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_DG[SJCounts$GROUP == "WT"]
## W = 0.023436, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_DG[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.0001201865

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_DG[SJCounts$GROUP == "MUT"]
MUT_SJi
##  [1] 0.00000000 0.00000000 0.00000000 0.05428882 0.00000000 0.00000000
##  [7] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## [13] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_DG - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts[SJCounts$GROUP == "MUT", c("sample_id", "Difference")]
##     sample_id    Difference
## 42    BA2093R -0.0001201865
## 47    BA2098R -0.0001201865
## 49    BA2101R -0.0001201865
## 87    BA2218R  0.0541686300
## 115   BA2276R -0.0001201865
## 126   BA2301R -0.0001201865
## 191   BA2470R -0.0001201865
## 219   BA2523R -0.0001201865
## 236   BA2564R -0.0001201865
## 286   BA2691R -0.0001201865
## 307   BA2731R -0.0001201865
## 315   BA2748R -0.0001201865
## 345   BA2822R -0.0001201865
## 361   BA2851R -0.0001201865
## 383   BA2901R -0.0001201865
## 393   BA2914R -0.0001201865
## 414   BA2956R -0.0001201865

1.5.1.2 ECDF

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:2] = -0.00012019, 0.052762
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
##  [1] 0.9977273 0.9977273 0.9977273 1.0000000 0.9977273 0.9977273 0.9977273
##  [8] 0.9977273 0.9977273 0.9977273 0.9977273 0.9977273 0.9977273 0.9977273
## [15] 0.9977273 0.9977273 0.9977273
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_DG")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$Prediction <- "Donor Gain"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr1:114713979-114716126"
MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- NA

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

1.5.2 Exon Skipping

1.5.2.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_ES2[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_ES2[SJCounts$GROUP == "WT"]
## W = 0.56704, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_ES2[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.4989328

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_ES2[SJCounts$GROUP == "MUT"]
MUT_SJi
##  [1] 0.0000000 0.0000000 0.0000000 0.4343105 2.3058252 0.9638554 0.2811621
##  [8] 0.4629630 0.0000000 0.3218021 0.5347594 0.1421464 0.0000000 0.1232286
## [15] 0.3064351 0.1039501 0.4566210

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_ES2 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts[SJCounts$GROUP == "MUT", c("sample_id", "Difference")]
##     sample_id  Difference
## 42    BA2093R -0.49893283
## 47    BA2098R -0.49893283
## 49    BA2101R -0.49893283
## 87    BA2218R -0.06462230
## 115   BA2276R  1.80689241
## 126   BA2301R  0.46492259
## 191   BA2470R -0.21777070
## 219   BA2523R -0.03596987
## 236   BA2564R -0.49893283
## 286   BA2691R -0.17713074
## 307   BA2731R  0.03582652
## 315   BA2748R -0.35678642
## 345   BA2822R -0.49893283
## 361   BA2851R -0.37570425
## 383   BA2901R -0.19249770
## 393   BA2914R -0.39498273
## 414   BA2956R -0.04231183

1.5.2.2 ECDF

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:302] = -0.49893, -0.45996, -0.45575,  ...,  4.702, 5.2631
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
##  [1] 0.2954545 0.2954545 0.2954545 0.6909091 0.9409091 0.8704545 0.5818182
##  [8] 0.7068182 0.2954545 0.6159091 0.7545455 0.4340909 0.2954545 0.4000000
## [15] 0.6000000 0.3795455 0.7022727
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_ES2")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- NA
MUT_df$Prediction <- "Exon Skipping"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr1:114713979-114716657"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

1.5.2.3 Mann Whitney

Normality:

shapiro.test(SJCounts$Normalized_ES2[SJCounts$GROUP== "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_ES2[SJCounts$GROUP == "WT"]
## W = 0.56704, p-value < 2.2e-16
shapiro.test(SJCounts$Normalized_ES2[SJCounts$GROUP== "MUT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_ES2[SJCounts$GROUP == "MUT"]
## W = 0.65546, p-value = 3.774e-05

Mann-Whitney:

wt <- wilcox.test(x=SJCounts$Normalized_ES2[SJCounts$GROUP== "MUT"], 
                  y=SJCounts$Normalized_ES2[SJCounts$GROUP== "WT"],
                  alternative = "two.sided", 
                  paired = FALSE, 
                  conf.int = 0.95)
wt
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  SJCounts$Normalized_ES2[SJCounts$GROUP == "MUT"] and SJCounts$Normalized_ES2[SJCounts$GROUP == "WT"]
## W = 3703, p-value = 0.9448
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  -0.1358253  0.1232654
## sample estimates:
## difference in location 
##          -1.195008e-05

2 KRAS chr12,25245347,C,T

Variant found in 8 patients of the BeatAML (8 samples)

  • Patients with KRAS chr12,25245347,C,T variant: 8 patients (8 samples)
  • Patients with the variant and RNASeq for validation: 7 patients (7 samples)

The splicing alterations being assessed are:

  • Donor Gain: predicted at 1bp from the variant, chr12:25245345, not found in the splice junction collection.
  • Donor Gain: chr12:25245281-25245348, not found in the splice junction collection.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"KRAS_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="KRAS" & found_variants$MutationKey_Hg38 == "chr12,25245347,C,T",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

2.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

2.1.1 Donor Gain

Search: predicted at 1bp from the variant 25245347, chr12:25245345

Show all the splice junctions containing the positions from 25245340 to 25245349

colnames(GeneSJ)[grepl("2524534", colnames(GeneSJ))] 
## character(0)

Alternative SJ not found in the splice junction collection.

2.1.2 Donor Gain

Search: chr12:25245281-25245348

Show all the splice junctions containing the positions from 25245340 to 25245349

colnames(GeneSJ)[grepl("2524534", colnames(GeneSJ))] 
## character(0)

Show all the splice junctions containing the positions from 25245280 to 25245289

colnames(GeneSJ)[grepl("2524528", colnames(GeneSJ))] 
## character(0)

Alternative SJ not found in the splice junction collection.

2.2 VAF

Mutated samples vaf:

3 KMT2D chr12,49022063,G,A

Variant found in 4 patients of the BeatAML (4 samples)

  • Patients with KMT2D chr12,49022063,G,A variant: 4 patients (4 samples)
  • Patients with the variant and RNASeq for validation: 4 patients (4 samples)

The splicing alterations being assessed are:

  • Donor Gain: predicted at 2bp from the variant, chr12:49022064, not found in the splice junction collection.
  • Donor Gain: chr12:49022070, not found in the splice junction collection.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"KMT2D_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="KMT2D" & found_variants$MutationKey_Hg38 == "chr12,49022063,G,A",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

3.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

3.1.1 Donor Gain

Search: chr12:49022064

Show all the splice junctions containing the positions from 49022060 to 49022069

colnames(GeneSJ)[grepl("4902206", colnames(GeneSJ))] 
## character(0)

Alternative SJ not found in the splice junction collection.

3.1.2 Donor Gain

Search: chr12:49022070 Show all the splice junctions containing the positions from 49022070 to 49022079

colnames(GeneSJ)[grepl("4902207", colnames(GeneSJ))] 
## character(0)

Alternative SJ not found in the splice junction collection.

3.2 VAF

Mutated samples vaf:

4 FLT3 chr13,28018485,G,T

Variant found in 3 patients of the BeatAML (3 samples)

  • Patients with FLT3 chr13,28018485,G,T variant: 3 patients (3 samples)
  • Patients with the variant and RNASeq for validation: 1 patients (1 samples)

The splicing alterations being assessed are:

  • Donor Loss: chr13:28015702-28018466; donor splice site chr13:28018466.
  • Exon Skipping 20: chr13:28015702-28023349, found in mutated samples.
  • Exon Skipping 19: chr13:28018590-28024860, found in mutated samples.
  • Exon Skipping 19 + 20: chr13:28015702-28024860, found in mutated samples.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"FLT3_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="FLT3" & found_variants$MutationKey_Hg38 == "chr13,28018485,G,T",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

4.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

4.1.1 Exon Skipping 20

Search: chr13:28015702-28023349

Show all the splice junctions containing the position 28015702_28023349

colnames(GeneSJ)[grep("28015702_28023349",colnames(GeneSJ))]
## [1] "chr13_28015702_28023349"

Found: chr13:28015702-28023349

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr13_28015702_28023349
##   [1]  25  16   4   6   8   0  20  37  18  13  15  69  18   8  62  19  69  31
##  [19]   4   9  14  23  21  13  71  10   8   1   0  32  61  12  34   8   1  13
##  [37]  14  12  36   1  51  14 162  19   8  31   6   4  10  40 130   8   8  26
##  [55]   2  26   1  22  22   3   0   0   4  13  12   0  26   7  25  34   7  19
##  [73]   2  33   1  33   0  25   5  48  20  10  47  22  17  16   4   2   2   3
##  [91]  86  64   5   1  51  30  22  42  16   9  36  25  24  17  59   9  45  10
## [109]   0   0  14  23  15   3   6  57  28  21   0  35   8   2  27  12  41  26
## [127]   5   2  32  29  13  15   9   8  11   4   9   2  15   0  14  40   6  26
## [145]  12   9  11  52  28  33   3  23  34  23  25  34  52  14   6  12  16  60
## [163]   2  30  29   0  15  10  21  13   9   0  19  12   8   1  36  20  27   3
## [181]   4  71  22  45  26  13   2  67  23   0  14   3   7  11  37   3   8   7
## [199]   6  31  23   7  15   1  13  13  17  44 167  28  20   8   0  58  16   9
## [217]  11  17  15  23   0  29   7  11  33  44   5  43   6   3   8  16  11  88
## [235]  39   5   0   8   7   4   0 117  30  18  15  21  23   1  32  18   0  17
## [253]  17   9   5  39   3  10  15  32  23   6   4  10  16   7  16  33  14   0
## [271]  13  15   5   9   8  17  16  36  24  28  16  10  70   8   3  10  12   7
## [289]   6  18  10   0 124   4  14  27   0  14   1   6  10   5  30  20  21   1
## [307]  18   5   5  94  30   3  61   6   9  19   9   0  10  21   1   7  13  13
## [325]  17  35   0   5   3   7  53  21   8   4  24  19  30  18  25  24  25  36
## [343]  15  83  18  17  22  12   6  23  37  19   8   0   2   2  45  12  59  30
## [361]  20   0   0   7   7   5  23  15  18  10 119  33   5   0  11   3   1   0
## [379]  55  10   0  14   8  20   1   8   6  99  21   2   5  28   7   2  14  15
## [397]  19  12  37  12  37   0   5   2  95  21  21   2  46   4  33  12   7   6
## [415]  30   4  11   2  17   3  45  51   7  21  11  13  95  23  73  12  22  10
## [433]   0  15  13  14  17   7   5   0   8   2  10   1  17   9   0   9  35  12
## [451]   5  35  39  20   3   2  61

Samples with the SJ of interest:

table(GeneSJ$chr13_28015702_28023349>0) 
## 
## FALSE  TRUE 
##    33   424

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr13_28015702_28023349 > 0])
## 
## MUT  WT 
##   1 423

Alternative SJ found in the mutated samples.

4.1.2 Exon Skipping 19

Search: chr13:28018590-28024860

Show all the splice junctions containing the positions 28018590-28024860

colnames(GeneSJ)[grep("28018590_28024860",colnames(GeneSJ))]
## [1] "chr13_28018590_28024860"

Found: chr13:28018590-28024860

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr13_28018590_28024860
##   [1]  3  4  2  4 13  0  4  7  0  2  0  3  7  3  1  1  8  3  9  5  5  4  7  2 27
##  [26]  1  4  1  0  2  2 17  2  0  0 14  3  8  7  4  2  4  4  2  7 23  1 15  2  0
##  [51] 14  0  1  4  4  2  1  3 14  0  6  0  0 25  0  1  4  2  2  2  0  9  3  2  0
##  [76]  2  4  6  9 13  2  0 78  4 30  1  3  0  0  0  1 53  5  1 13  4  4  3  0  3
## [101] 15  3  5  3  4  1 24  0  0  0  0  1 24  0  0  9  4  1  0  2 16  0  2  0  3
## [126]  1 22  1  6 15  0  4  0  5  0  0  2  5  0  0  1  3 21  2  3  0  0 11  4  6
## [151]  0 21 91  0  6  8  3  6  0  2  2  9  1 14  6  0 23  5  2  2  0  0  6  1  1
## [176]  2  1  5  3 13  3  5  2  6  5  8  1 11  0  0  0  2  2  0  7  2  1  0  1  2
## [201]  1  5 20  0  0 11  1  4  3  6  9  4  1  2 22  3  0  2  2  8  1 25  6 11  1
## [226] 13  0 10  0  0  1 15 12  3  4  1  0  4  1  1  2  5  4  2  2  7 21  0  1 36
## [251]  2 26  3  0  3  1  0  0  6  6  2  4  0  2  1  2  3  6 22  0  9  7  1  7  1
## [276]  0  2  1 18  6  2  4  1 10  0  0  4  8  0  3  5 10  3  1  3  6  0  2  0  4
## [301]  0 23  3  0  5  0  4  5 76  8  3  0  3  0  0  1  8  2 11 10  0  1 14 17  5
## [326]  6  0  0  0  0  3  4  4  1  6  2  5  1 11  0 22  7  9 16  5  0  2  3  4  0
## [351]  8  6  8  0 11  0  6  4  7  2  7  1  0  7  2  0  7  5  2  1  8  3  0  1  3
## [376]  0  0  7  3  1  0 10  4 30  3  0 16  1  2  0 23 10  2  2  2  1  0  7  1  0
## [401]  5  0  0  2  3  4  2  1 19  0  2 16  0 11  3  7  4  0  2  6  5  7 18 15  4
## [426]  6 27  1  5  3  7  1  0  7  2  6  6 10  1  1  7 15  2  0  8  0  0 17  5  1
## [451]  2  8  4  3  2  1  0

Samples with the SJ of interest:

table(GeneSJ$chr13_28018590_28024860>0) 
## 
## FALSE  TRUE 
##   102   355

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr13_28018590_28024860 > 0])
## 
## MUT  WT 
##   1 354

Alternative SJ found in the mutated samples.

4.1.3 Exon Skipping 19+20

Search: chr13:28015702-28024860

Show all the splice junctions containing the positions 28015702-28024860

colnames(GeneSJ)[grep("28015702_28024860",colnames(GeneSJ))]
## [1] "chr13_28015702_28024860"

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr13_28015702_28024860
##   [1]  0  7  0  6  3  0  0  2  0  2  0  0  3  1  0  0  1  0  3  0  6  0  4  3 46
##  [26]  4  0  1  0  1  0  7  0  0  0  3  2  0  4  0  0  1  1  0  6  9  5 19  0  0
##  [51] 19  0  1  0  3  0  0  3  3  0  0  0  0 12  0  0  0  1  0  1  1  6  1  2  0
##  [76]  0  0  0  3  0  0  1 58  1  6  0  1  0  0  0  1  6  1  1  4  1  0  1  0  0
## [101]  5  0  1  4  0  0 16  0  0  0  0  2 11  0  1  0  0  2  0  0  4  0  1  0  1
## [126]  9  6  0  3  5  0  0  0  3  0  0  0  1  0  0  5  0  3  4  0  0  0  6  1  2
## [151]  0 19 36  0  4  2  2  1  0  2  0  7  0  7  1  0 16  3  0  3  3  1  2  5  1
## [176]  5  1  1  3  0  0  2  0 13  6  3  0  7  0  4  0  0  0  2  0  2  0  0  0  0
## [201]  0  3 32  0  1  3  0  1  7  3  6  5  0  0 17  6  0  0  4  8  0  7  0 11  0
## [226] 15  0  7  1  0  0 17 10  4  5  1  0  5  0  0  0  7  2  0  2  2 14  0  1  6
## [251]  0  2  0  0  0  0  1  0  0  0  0  1  0  0  0  2  0  3 15  0  2  5  0  0  0
## [276]  0  0  0 16  0  0  1  5  8  5  0  4  8  0  2  1  0  2  3  1  0  0  0  0  0
## [301]  0  6  4  0  0  0  1  3 47 21  0  0  0  1  0  0  4  0  5  2  0  0  5  6  0
## [326]  0  0  1  4  2  2  3  5  1  5  0  0  0  5  1 30  7  0  4  0  0  7  0  1  0
## [351] 15  0  7  0  2  0  4  5  3  0  0  0  0  9  0  0  2  7  2  0 25  8  0  2  4
## [376]  0  0  5  0  0  0  2  1 13  5  0 10  2  0  0 11  0  0  2  3  0  0  0  0  1
## [401]  0  0  1  0  6  0  2  0 15  0  0 15  0  8  1  2  1  0  0  0  8  0  7  7  0
## [426]  1 50  1  4  0  0  1  0  0  0  5  1  6  0  0  5  3  0  0  2  0  0  6  0  0
## [451]  0  0  0  2  2  0  0

Samples with the SJ of interest:

table(GeneSJ$chr13_28015702_28024860 >0) 
## 
## FALSE  TRUE 
##   224   233

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr13_28015702_28024860 > 0])
## 
## MUT  WT 
##   1 232

Alternative SJ found in the mutated samples.

4.1.4 Canonical SJ

Exon 19-20 chr13:28018590-28023349

Exon 20-21 chr13:28015702-28018466; splice site chr13:28018466

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr13_28018590_28023349
##   [1]  748  302  315   13  308   11 1210 1229  515  322 1004 1831  552  236 2249
##  [16]  391 1778 1554  188  983  209 1002  400  170 1468  237  504   74    2 1069
##  [31] 3007  530 1178  265   12  601  637  417  976   43 1461  805 4251  743  367
##  [46] 1939  743   77  212  745 1789   92  361  892   57  745   12  152 1741  183
##  [61]  415   10  170  617  566   52 1533   46  701  867  406  847  151 1760   11
##  [76] 1111   77 1330  142 1288 1062  693 1273  929  778  481  116   55   21  114
##  [91] 1107 2697   43   47 1167 2015 1349  805 1035  836 2130 1040 1022  495 1614
## [106]  267 1061  465  105   11  500 1220  792  331  103 4216 1837  731   20 1017
## [121]  327   11  722  490  760  546  722   62 1790  768  280 1958  685  521  421
## [136]  185  374   87  785   20  304 1758  163  413  588   64  353 1082  889  710
## [151]  195  932 2657 1034 1274 1807 3147  621  532  234  927 1354   91  857 1132
## [166]   61 1048  604 2019  814  280   42  381  245  205  171 1883 1840  748  113
## [181]  420 1974  586 1124  427  321   91 2041  851   12  119   57   81  704 2238
## [196]   83  493  415  905 1041  800   59  297    3  324  240  972 1496 1595  928
## [211]  942  425   24  820  384   96  337 1391  918  414   11  707  679  288 1247
## [226]  850  101 2143  398  110  256 1118  356 1431  906  397    3  329  228  159
## [241]  231 2073 1121  918  708  342  481  120 1324  702    4 1110  797  706  425
## [256]  973  136  645  740 2412 1010  564  282  519  749   57  766 1000  407    3
## [271]  265  331  148  768  345  508  756 1405  359 1512 1055  369 1264  213   92
## [286]  360  480  178  252  332 1066  356 3096  165  563 1439   21  429   18 1032
## [301]  240  157 1159  777 1204  256  312  254  337 3073 1114  263  698  898  260
## [316]  868  361   78  445 1265   23  365  622  660  675 1180  132  323  198  222
## [331] 2166  622  312   82  894  566 1948  760 1076  874  741 1117 2441 4206  626
## [346]  785  362 1123  299  659  728 1091  286    3   90   33  860  230 2750  889
## [361] 1636   72   12   53  114  186 1028  397  290  429 6005 1146  137    8  108
## [376]   34  225  215 1918  611    0  818  212  522   33  657  408 1743 1460  206
## [391]  436 2691  196   50  560  686  498  741 1051  468 1921   12  503  253 1820
## [406] 1293  686   73 1496  502 2013  233  490  243 1777  192  560  331  683  154
## [421] 1384 1549  341  388  644  573 1174 1417 1674 1419 1142  795   22  654  466
## [436]  307  632  330  219   28  162  112  362   97  499  158   18  344 1260  804
## [451]  264 1254 1083  964  227   60 2238

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr13_28015702_28018466
##   [1]  648  324  222   48  270   14  907 1020  534  254  779 1725  565  209 2251
##  [16]  350 2036 1250  224  970  148  843  402  206 1714  203  407   69    3 1171
##  [31] 2450  504 1164  247   16  591  646  388 1007   42 1080  735 4491  735  421
##  [46] 1653  117  126  178  759 1867   78  374  847   47  584   10  177 1426  170
##  [61]  353    6  182  476  511   52 1076   33  664  891  322  909  160 1417    8
##  [76]  902   74 1090  167 1179  909  575 1083  819  764  361   75   49   26  112
##  [91]  894 2326   37   43  974 1571 1325  767 1047  635 1708  846  962  487 1520
## [106]  268  955  514   78    7  466  998  703  252   78 3551 1452  776   14 1074
## [121]  227   14  765  390  615  493  780   59 1446  784  231 1534  703  433  405
## [136]  192  393  101  675   28  328 1812  151  399  629   52  317  899  658  532
## [151]  202 1011 2468 1074 1229 1558 2548  496  563  208  824 1418  115  728  875
## [166]   53  853  575 1941  916  325   39  288  246  186  176 1472 1338  737  115
## [181]  337 2010  436 1074  340  293   59 1702  843    8  104   38   93  878 1870
## [196]   80  474  423  611 1214  789   76  216   10  360  214  781 1271 1360 1091
## [211]  838  469   32  704  441  106  346 1200  875  413   24  739  597  355  988
## [226]  969   92 2063  412   74  273  943  310 1603  709  342    8  242  222  151
## [241]  181 2257  950  833  554  280  618  126 1117  608    9 1024  775  607  424
## [256] 1144  139  742  719 2660  871  536  291  451  538   61  674 1003  448    2
## [271]  236  343  145  588  302  548  577 1499  354 1236  947  392 1355  186   91
## [286]  337  516  191  217  328  894   44 3326  166  566 1536    3  430   23  129
## [301]  211  140  998  547 1012  194  330  263  429 3303  788  235  581  865  278
## [316]  963  392   68  342 1059   10  280  611  629  565 1038   21  242  131  158
## [331] 2357  618  311   73 1013  499 1569  760  917  886  754  883  362 3557  679
## [346]  603  428  872  280  689  858  986  306    0  120   23  899  253 2176  703
## [361] 1264   57    9   54  136  202 1062  350  212  454 6471  962  126    9   88
## [376]   27  168  136 1579  652    0  880  208  514   35  560  347 1884 1520  215
## [391]  441 2191  164   35  581  796  519  595 1149  325 2032    9  518  258 1880
## [406] 1025  686   68 1254  349 2134  234  403  295 1918  221  450  261  518  117
## [421] 1096 1630  224  403  511  446 1374 1120 1400 1245 1213  786   27  567  379
## [436]  328  630  309  246   36  188  138  308   65  576  146   16  390 1144  609
## [451]  197  924  941 1002  176   48 1735

4.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonEx19_20 <- (GeneSJ$chr13_28018590_28023349)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_CanonEx20_21 <- (GeneSJ$chr13_28015702_28018466)/GeneSJ$rowSum_SJtotal*100

GeneSJ$Normalized_SE20 <- (GeneSJ$chr13_28015702_28023349)/GeneSJ$rowSum_SJtotal*100
GeneSJ$INCLUSION_Ex20 <- GeneSJ$chr13_28018590_28023349 + GeneSJ$chr13_28015702_28018466
GeneSJ$PSI_SE20 <- (GeneSJ$INCLUSION_Ex20)/(GeneSJ$chr13_28015702_28023349+GeneSJ$INCLUSION_Ex20)


GeneSJ$Normalized_SE19 <- (GeneSJ$chr13_28018590_28024860)/GeneSJ$rowSum_SJtotal*100
GeneSJ$INCLUSION_Ex19 <- GeneSJ$chr13_28018590_28023349 + GeneSJ$chr13_28023478_28024860
GeneSJ$PSI_SE19 <- (GeneSJ$INCLUSION_Ex19)/(GeneSJ$chr13_28018590_28024860+GeneSJ$INCLUSION_Ex19)

GeneSJ$Normalized_SE1920 <- (GeneSJ$chr13_28015702_28024860)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

4.3 VAF

Mutated samples vaf:

4.4 Plots

4.4.1 Static Dot Plots

Canonical splice junction: Exon 20-21 chr13:28015702-28018466; donor splice site chr13:28018466

Splicing alterations:

4.4.2 Interactive Dot Plots

Canonical splice junction: Exon 20-21 chr13:28015702-28018466; donor splice site chr13:28018466

Splicing alterations:

4.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

4.5 Statistical Analysis

SJCounts <- GeneSJ

4.5.1 Donor Loss

4.5.1.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonEx20_21[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonEx20_21[SJCounts$GROUP == "WT"]
## W = 0.9067, p-value = 4.086e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonEx20_21[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 5.304403

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonEx20_21[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 4.803404

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonEx20_21 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -0.5009997

4.5.1.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:455] = -5.3044, -4.4473, -4.2071,  ..., 5.2219, 6.4165
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.2872807
print(paste0("MUT Percentile: ", v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "MUT Percentile: 0.287280701754386"
print(paste0("Inferred Pvalue: ", v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "Inferred Pvalue: 0.287280701754386"
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonEx20_21")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <-  MUT_df$ECDF

MUT_df$Prediction <- "Donor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr13:28015702-28018466"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

4.5.2 Exon Skipping 20

4.5.2.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_SE20[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_SE20[SJCounts$GROUP == "WT"]
## W = 0.77609, p-value < 2.2e-16

Value of Mean Normalized Expression of the SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_SE20[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.1719062

Normalized Expression Value of the SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_SE20[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.2504632

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_SE20 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 0.07855696

4.5.2.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:424] = -0.17191, -0.151, -0.14868,  ..., 0.8037, 1.1439
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.8267544
print(paste0("MUT Percentile: ", v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "MUT Percentile: 0.826754385964912"
1-v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.1732456
print(paste0("Inferred Pvalue: ", 1-v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "Inferred Pvalue: 0.173245614035088"

Download the vaf, inferred percentiles and pvalues of the mutated samples:

4.5.3 Exon Skipping 19

4.5.3.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_SE19[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_SE19[SJCounts$GROUP == "WT"]
## W = 0.56862, p-value < 2.2e-16

Value of Mean Normalized Expression of the SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_SE19[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.07528691

Normalized Expression Value of the SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_SE19[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.01715501

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_SE19 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -0.0581319

4.5.3.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:355] = -0.075287, -0.072386, -0.072168,  ..., 0.80191, 1.2068
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.4232456
print(paste0("MUT Percentile: ", v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "MUT Percentile: 0.423245614035088"
1-v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.5767544
print(paste0("Inferred Pvalue: ", 1-v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "Inferred Pvalue: 0.576754385964912"
v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.4232456
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_SE19")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1- MUT_df$ECDF

MUT_df$Prediction <- "Exon Skipping"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr13:28018590-28024860"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

4.5.4 Exon Skipping 19 + 20

4.5.4.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_SE1920[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_SE1920[SJCounts$GROUP == "WT"]
## W = 0.3588, p-value < 2.2e-16

Value of Mean Normalized Expression of the SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_SE1920[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.04698132

Value of Mean Normalized Expression of the SJ in WT samples:

MUT_SJi <- SJCounts$Normalized_SE1920[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.01372401

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_SE1920 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -0.03325731

4.5.4.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:233] = -0.046981, -0.045737, -0.044446,  ..., 1.2688, 1.4076
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.6074561
print(paste0("MUT Percentile: ", v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "MUT Percentile: 0.607456140350877"
1-v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.3925439
print(paste0("Inferred Pvalue: ", 1-v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])))
## [1] "Inferred Pvalue: 0.392543859649123"
v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.6074561
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_SE1920")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1- MUT_df$ECDF

MUT_df$Prediction <- "Exon Skipping"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr13:28015702-28024860"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

5 IDH1

Three (3) variants: chr2,208248389,G,A, chr2,208248389,G,T & chr2,208248389,G,C

Variants found in 30 patients of the BeatAML (35 samples)

  • Patients with IDH1 chr2,208248389,G,A variant: 23 patients (27 samples)
  • Patients with IDH1 chr2,208248389,G,T variant: 5 patients (5 samples)
  • Patients with IDH1 chr2,208248389,G,C variant: 2 patients (3 samples)

Patients with the variant and RNASeq for validation:

  • Patients with IDH1 chr2,208248389,G,A variant and RNASeq for validation: 14 patients (15 samples)
  • Patients with IDH1 chr2,208248389,G,T variant and RNASeq for validation: 4 patients (4 samples)
  • Patients with IDH1 chr2,208248389,G,C variant and RNASeq for validation: 2 patients (3 samples)

The splicing alterations being assessed are:

  • Donor Gain: chr2:208245425-208248422, found in the mutated samples.
  • Exon Skipping 4: chr2:208245425-208251429, found in the splice junction collection, not found in the mutated samples.
  • Donor Loss: chr2:208245425-208248368; donor splice site chr2:208248368.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"IDH1_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="IDH1" & found_variants$MutationKey_Hg38 %in% c( "chr2,208248389,G,A","chr2,208248389,G,T" ,"chr2,208248389,G,C"),]
R132C <- samples_df$RNA_Sample[samples_df$MutationKey_Hg38 == "chr2,208248389,G,A" & samples_df$Validable =="Validable"] #n=15 G>A
R132G <- samples_df$RNA_Sample[samples_df$MutationKey_Hg38 == "chr2,208248389,G,C" & samples_df$Validable =="Validable"] #n=3 G>C
R132S  <- samples_df$RNA_Sample[samples_df$MutationKey_Hg38 == "chr2,208248389,G,T" & samples_df$Validable =="Validable"] #n=4 G>T

cases <- append(R132C, R132G)
cases <- append(cases, R132S)

GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% R132C, "MUT", 
                                ifelse(GeneSJ$sample_id %in% R132G, "MUT",
                                      ifelse(GeneSJ$sample_id %in% R132S, "MUT", 
                                                     "WT")))


GeneSJ$Variant_status <- ifelse(GeneSJ$sample_id %in% R132C, "R132C", 
                                ifelse(GeneSJ$sample_id %in% R132G, "R132G",
                                              ifelse(GeneSJ$sample_id %in% R132S, "R132S", 
                                                     "WT")))

5.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

5.1.1 Donor Gain

Search: chr2:208245425-208248422

Show all the splice junctions containing the position 208245425-208248422

colnames(GeneSJ)[grep("208245425_208248422",colnames(GeneSJ))]
## [1] "chr2_208245425_208248422"

Found: chr2_208245425_208248422

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr2_208245425_208248422
##   [1] 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0
##  [38] 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
## [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
## [223] 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
## [260] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0
## [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [334] 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [371] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [408] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0
## [445] 0 0 0 0 0 1 0 0 0 0 0 0 2

Samples with the SJ of interest:

table(GeneSJ$chr2_208245425_208248422>0) 
## 
## FALSE  TRUE 
##   427    30

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr2_208245425_208248422 > 0])
## 
## MUT  WT 
##   7  23

Alternative SJ found in the mutated samples.

5.1.2 Exon Skipping 4

Search: chr2:208245425-208251429

Show all the splice junctions containing the positions 208245425-208251429

colnames(GeneSJ)[grep("208245425_208251429",colnames(GeneSJ))]
## [1] "chr2_208245425_208251429"

Found: chr2_208245425_208251429

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr2_208245425_208251429
##   [1] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [260] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [371] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [408] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [445] 0 0 0 0 1 0 0 0 0 0 0 0 0

Samples with the SJ of interest:

table(GeneSJ$chr2_208245425_208251429>0) 
## 
## FALSE  TRUE 
##   454     3

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr2_208245425_208251429 > 0])
## 
## WT 
##  3

Alternative SJ not found in the mutated samples of the splice junction collection.

5.1.3 Canonical SJ

Exon 3-4: chr2:208248661-208251429

Exon 4-5: chr2:208245425-208248368; splice site chr2:208248368

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr2_208248661_208251429
##   [1]  730   57  172    4   15  197  349  485  352  132  270  379  109  114  186
##  [16]  100  233  393   11  193   37  705  110   50   37   90  304    7    8  336
##  [31]  464   41  353  231  184   24  269   38   43   38  779  110  777  141   82
##  [46]   85  316   32   50  399   55   63  154  257   15  222    4   69   77  229
##  [61]   58   16  179   25  453   15   97   26  217  103  155   82   35  244   57
##  [76]  295   86  461   20  278  416  300   20  312   75  206  125  177  120   87
##  [91]  667  133   24   52   57  681  395  299  275  917   77  544  111   68  488
## [106]  245  161  164   57   33  200  665   71  426   43  373  638  155  165  172
## [121]   54  105   98  294  470   59   27  527  203   61  161  106  925  208  333
## [136]  218   61   55  161   19  114  227    8   19  255   47  119   50 1051  171
## [151]  377  121   44  150  298  188  506   93   31  312  636   89  129   90  318
## [166]  135  232   71  116   95   52   77  110    9  102   92  326  969  205   46
## [181]  509  257   60  157  114  100  151  248  178   40   58   37   56  119  379
## [196]    7  347  307  570  192  194   10  260   26  384   14  434  244  352   49
## [211]   62   32   12  816   35   18   23  745  134   37   12   27  220  151  281
## [226]   40  221   49  168  271  372  110   86  317  317  320    7  107  278  249
## [241]  114   81  147  379  146   13   58    7  328   50    8   74  196  381  176
## [256]  414   74  184  128  104  174   46   11  238  203   19  201  176   23   52
## [271]  180   35  259  251  181  228  380  175   71   94  513  210  124   51   17
## [286]  295   30   30   53   84  183  155   85  143  348  458   18  318   17  340
## [301]   88   37  118 1517   88  297  113   16   89  193 2470  124  415   50  269
## [316]  254   20  210  118   94   85  439   53   32  322  532  590  316  365  104
## [331]  141   13   45   93  168   56  127  136  121  454   57   20  783  680  319
## [346]  655  200  428   74   89   42  121   74  133   19   43   33   89  103  110
## [361]  490   13   31   29   80  282  238  147   27  359  662  168  130    4   33
## [376]   94  287   27 1409  173    0   86   37   49   22  350   13  229  180  169
## [391]  147  229  467    3   24  181  742  245  261  583  437   23   90  157   95
## [406]  136   19   52  184   97  591   25  197   38  446   32  183  262  242   43
## [421]  103  277   62   79   83  134   21  169  220  683  182  158    1  143  127
## [436]   70  191   14  137   33   25   80  701  140  255  143   41   19  596  891
## [451]  244  377  437  317   68   36 2812

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr2_208245425_208248368
##   [1] 1035   76  129   10   23  181  382  511  638  148  317  509  143  124  199
##  [16]   93  327  399   36  284   24  615  143   75   51   81  373   13   20  428
##  [31]  555   87  613  408  148   50  464   36   65   39  797  156 1036  273  180
##  [46]  102  117   44   39  647  116   64  248  283   25  233   12  112   90  390
##  [61]   78   12  259   36  615   26  103   34  364  157  121  165   36  353   92
##  [76]  429   80  557   38  440  510  420   23  442  101  201   88  280  134   98
##  [91]  845  124   41   59   64  761  585  393  470  930  100  474  170  116  612
## [106]  313  297  269   41   20  234  663   77  487   70  559  746  221  218  218
## [121]   52  136  154  314  558  115   33  771  249  110  222  153 1654  236  572
## [136]  379   91   72  192   46  253  397   30   23  330   61  166   66 1200  256
## [151]  553  195   71  234  330  246  673  125   35  282  844  123  202  125  281
## [166]  185  248  124  178  132   80  106   93   10  119  139  281  901  386   87
## [181]  465  359   51  269  100   62  174  286  226   26   65   21   49  262  629
## [196]   10  598  404  543  231  333   12  287   30  630   34  374  320  529   62
## [211]   52   52   21  948   42   23   55 1075  161   80   23   30  249  151  349
## [226]   45  265  107  282  322  641   60  109  572  257  349   20   63  578  320
## [241]  124   86  152  414  124   25   64   19  499   67   14  130  366  573  277
## [256]  559  101  264  200  138  332   62   27  377  173   37  166  366   32  117
## [271]  272   55  455  304  258  268  419  264   90   89  736  295  166   62   11
## [286]  397   44   20   43  134  210   43  153  243  668  672    1  493   22   87
## [301]  129   37  158 1283  132  268  199   22   91  383 2332  140  532   82  384
## [316]  459   27  154  176   93   72  545   53   30  460  430  156  415  316  102
## [331]  212   37   36   54  213   74  106  232  171  675  110   24  240  961  444
## [346]  612  289  494   60  118   45   77  100  166   52   43   56  143  139  111
## [361]  477   13   19   41  134  447  384  209   46  478  955  199  226   23   52
## [376]  115  262   10 2021  206    1  121   46  125   24  317   22  335  288  233
## [391]  261  214  377    8   35  312 1019  180  423  465  633   22  134  301  127
## [406]  175   55   59  208   73  874   40  276   32  894   49  215  281  275   65
## [421]   59  352   42  154  116  173   53   82  253  991  180  279    7  163  123
## [436]   74  314   38  222   57   60   71  996  175  291  222   92   27  823  947
## [451]  267  345  437  469   72   27 2835

5.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonEx3_4 <- (GeneSJ$chr2_208248661_208251429)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_CanonEx4_5 <- (GeneSJ$chr2_208245425_208248368)/GeneSJ$rowSum_SJtotal*100

GeneSJ$Normalized_DGEx4 <- (GeneSJ$chr2_208245425_208248422)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

5.3 VAF

Mutated samples vaf:

5.4 Plots

5.4.1 Static Dot Plots

Canonical splice junction: Exon 4-5: chr2:208245425-208248368; donor splice site chr2:208248368

Splicing alterations:

5.4.2 Interactive Dot Plots

Canonical splice junction: Exon 4-5: chr2:208245425-208248368; donor splice site chr2:208248368

Splicing alteration:

5.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

5.5 Statistical Analysis

SJCounts <- GeneSJ

5.5.1 Donor Gain

5.5.1.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_DGEx4[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_DGEx4[SJCounts$GROUP == "WT"]
## W = 0.16047, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_DGEx4[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.00342833

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_DGEx4[SJCounts$GROUP == "MUT"]
MUT_SJi
##  [1] 0.25252525 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
##  [7] 0.00000000 0.20661157 0.30303030 0.00000000 0.07942812 1.21130552
## [13] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.52151239
## [19] 0.00000000 0.00000000 0.00000000 0.52521008

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_DGEx4 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts[SJCounts$GROUP == "MUT", c("sample_id", "Difference")]
##     sample_id  Difference
## 16    BA2035R  0.24909692
## 21    BA2046R -0.00342833
## 40    BA2088R -0.00342833
## 113   BA2273R -0.00342833
## 120   BA2286R -0.00342833
## 127   BA2302R -0.00342833
## 154   BA2387R -0.00342833
## 170   BA2421R  0.20318324
## 186   BA2459R  0.29960197
## 215   BA2514R -0.00342833
## 219   BA2523R  0.07599979
## 232   BA2552R  1.20787719
## 288   BA2695R -0.00342833
## 317   BA2756R -0.00342833
## 323   BA2769R -0.00342833
## 333   BA2798R -0.00342833
## 334   BA2804R -0.00342833
## 352   BA2837R  0.51808406
## 378   BA2883R -0.00342833
## 392   BA2911R -0.00342833
## 398   BA2926R -0.00342833
## 428   BA2999R  0.52178175

5.5.1.2 ECDF

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:24] = -0.0034283, 0.0084984, 0.011035,  ..., 0.22805, 0.24974
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
##  [1] 0.9977011 0.9471264 0.9471264 0.9471264 0.9471264 0.9471264 0.9471264
##  [8] 0.9954023 1.0000000 0.9471264 0.9908046 1.0000000 0.9471264 0.9471264
## [15] 0.9471264 0.9471264 0.9471264 1.0000000 0.9471264 0.9471264 0.9471264
## [22] 1.0000000
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_DGEx4")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- NA

MUT_df$Prediction <- "Donor Gain"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr2:208245425-208248422"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

5.5.1.3 Kruskal Wallis

Kruskal Wallis:

kruskal.test(Normalized_DGEx4 ~ Variant_status, data = SJCounts)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Normalized_DGEx4 by Variant_status
## Kruskal-Wallis chi-squared = 69.916, df = 3, p-value = 4.448e-15

Pairwise comparisons:

pairwise.wilcox.test(SJCounts$Normalized_DGEx4, SJCounts$Variant_status)
## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  SJCounts$Normalized_DGEx4 and SJCounts$Variant_status 
## 
##       R132C R132G R132S  
## R132G 0.932 -     -      
## R132S 0.005 0.131 -      
## WT    0.045 0.932 8.9e-16
## 
## P value adjustment method: holm

Detailed:

pkw <- pairwise_wilcox_test(SJCounts,Normalized_DGEx4 ~ Variant_status)
pkw
## # A tibble: 6 × 9
##   .y.         group1 group2    n1    n2 statistic        p    p.adj p.adj.signif
## * <chr>       <chr>  <chr>  <int> <int>     <dbl>    <dbl>    <dbl> <chr>       
## 1 Normalized… R132C  R132G     15     3        27 4.66e- 1 9.32e- 1 ns          
## 2 Normalized… R132C  R132S     15     4         1 9.92e- 4 5   e- 3 **          
## 3 Normalized… R132C  WT        15   435      3771 1.1 e- 2 4.5 e- 2 *           
## 4 Normalized… R132G  R132S      3     4         0 4.4 e- 2 1.31e- 1 ns          
## 5 Normalized… R132G  WT         3   435       618 6.87e- 1 9.32e- 1 ns          
## 6 Normalized… R132S  WT         4   435      1739 1.49e-16 8.94e-16 ****

5.5.2 Donor Loss

5.5.2.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "WT"]
## W = 0.93166, p-value = 3.253e-13

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 16.20588

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "MUT"]
MUT_SJi
##  [1] 11.742424  7.792208 18.224299 15.308151 14.082687 15.207373 16.283925
##  [8] 13.636364  9.393939 13.592233 12.787927  8.075370  8.771930 11.392405
## [15] 12.649165 12.456747  9.872029 10.039113  4.716981 13.333333 11.029412
## [22]  8.613445

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonEx4_5 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
##  [1]  -4.46345182  -8.41366827   2.01842300  -0.89772497  -2.12318872
##  [6]  -0.99850279   0.07804878  -2.56951242  -6.81193667  -2.61364305
## [11]  -3.41794913  -8.13050594  -7.43394624  -4.81347100  -3.55671138
## [16]  -3.74912866  -6.33384681  -6.16676263 -11.48889493  -2.87254273
## [21]  -5.17646430  -7.59243068

5.5.2.2 ECDF

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:430] = -14.319, -11.052, -10.923,  ..., 9.6562, 11.191
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
##  [1] 0.045977011 0.018390805 0.777011494 0.335632184 0.165517241 0.328735632
##  [7] 0.471264368 0.131034483 0.025287356 0.128735632 0.075862069 0.020689655
## [13] 0.022988506 0.045977011 0.071264368 0.064367816 0.029885057 0.032183908
## [19] 0.002298851 0.105747126 0.045977011 0.022988506
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonEx4_5")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- NA
MUT_df$Prediction  <- "Donor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr2:208245425-208248368"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

5.5.2.3 Kruskal Wallis

Kruskal Wallis:

kruskal.test(Normalized_CanonEx4_5 ~ Variant_status, data = SJCounts)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Normalized_CanonEx4_5 by Variant_status
## Kruskal-Wallis chi-squared = 35.08, df = 3, p-value = 1.172e-07

Pairwise comparisons:

pairwise.wilcox.test(SJCounts$Normalized_CanonEx4_5, SJCounts$Variant_status)
## 
##  Pairwise comparisons using Wilcoxon rank sum exact test 
## 
## data:  SJCounts$Normalized_CanonEx4_5 and SJCounts$Variant_status 
## 
##       R132C   R132G  R132S 
## R132G 0.6857  -      -     
## R132S 0.4974  0.6857 -     
## WT    5.7e-06 0.6857 0.0061
## 
## P value adjustment method: holm

Detailed:

pkw <- pairwise_wilcox_test(SJCounts,Normalized_CanonEx4_5 ~ Variant_status)
pkw
## # A tibble: 6 × 9
##   .y.           group1 group2    n1    n2 statistic       p   p.adj p.adj.signif
## * <chr>         <chr>  <chr>  <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
## 1 Normalized_C… R132C  R132G     15     3        18 6.54e-1 6.87e-1 ns          
## 2 Normalized_C… R132C  R132S     15     4        46 1.24e-1 4.96e-1 ns          
## 3 Normalized_C… R132C  WT        15   435       834 9.44e-7 5.66e-6 ****        
## 4 Normalized_C… R132G  R132S      3     4        10 2.29e-1 6.87e-1 ns          
## 5 Normalized_C… R132G  WT         3   435       404 2.56e-1 6.87e-1 ns          
## 6 Normalized_C… R132S  WT         4   435        53 1   e-3 6   e-3 **

6 U2AF1 chr21,43094667,T,C

Variant found in 3 patients of the BeatAML (3 samples)

  • Patients with U2AF1 chr21,43094667,T,C variant: 3 patients (3 samples)
  • Patients with the variant and RNASeq for validation: 3 patients (3 samples)

The splicing alterations being assessed are:

  • Donor Gain : chr21:43094564-43094666, found in the mutated samples .
  • Exon 6 Skipping: chr21:43094564-43095437, not found in the splice junction collection.
  • Exon 5 Skipping: chr21:43094789-43095693, found in the mutated samples.
  • Donor Loss: chr21:43094564-43094654; donor splice site chr21:43094654.
  • Canonical SJ Intron 4-5: chr21:43095537-43095693.
  • Canonical SJ Intron 5-6: chr21:43094789-43095437.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_pathUM <- paste0(extractedSJ_dir_in,"U2AF1_UM_annotSJ.tsv")
extractedSJ_pathMM <- paste0(extractedSJ_dir_in,"U2AF1_MM_annotSJ.tsv")
GeneSJ_UM <- read.delim(extractedSJ_pathUM, sep ="\t")
GeneSJ_MM <- read.delim(extractedSJ_pathMM, sep ="\t")

GeneSJ <- GeneSJ_UM[,grep("chr", names(GeneSJ_UM))] + GeneSJ_MM[,grep("chr", names(GeneSJ_MM))]
GeneSJ$INDEX <- GeneSJ_UM$INDEX
GeneSJ$sample_id <- GeneSJ_UM$sample_id
GeneSJ$case_id <- GeneSJ_UM$case_id
GeneSJ$file_id.BAM <- GeneSJ_UM$file_id.BAM
GeneSJ$file_name.BAM <- GeneSJ_UM$file_name.BAM
GeneSJ$file_id.STAR.SJCounts <- GeneSJ_UM$file_id.STAR.SJCounts
GeneSJ$file_name.STAR.SJCounts <- GeneSJ_UM$file_name.STAR.SJCounts

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="U2AF1" & found_variants$MutationKey_Hg38 == "chr21,43094667,T,C",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

6.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

6.1.1 Exon 6 Donor Gain

Search: chr21:43094564-43094666

Show all the splice junctions containing the position chr21:43094564-43094666

colnames(GeneSJ)[grep("43094564_43094666",colnames(GeneSJ))]
## [1] "chr21_43094564_43094666"

Found: chr21:43094564-43094666

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr21_43094564_43094666
##   [1]  0  0  0  0  0  1  1  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  1  0  0
##  [26]  0  1  0  0  0  0  2  0  0  0  0  0  0  0  1  1  0  0  0  0  0  0  0  0  0
##  [51]  0  1  0  0  1  0  0  2  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
##  [76]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  2  0  0  0  0  0  1
## [101]  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## [126]  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0 10
## [151]  0  0  0  0  0  2  0  0  0  0  0  0  9  0  0  0  0  0  0  0  0  0  0  0  0
## [176]  0  0  0  0  0  0  0 10  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0
## [201]  0  0  0  2  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0
## [226]  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  1  0  0  0  1  0  0  0  0  0
## [251]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## [276]  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0
## [301]  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  1  0
## [326]  0  0  0  0  0  0  1  0  0  0  0  0  0  1  1  0  0  0  2  0  0  0  0  0  0
## [351]  0  0  0  1  0  0  0  0  0  1  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0
## [376]  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## [401]  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0
## [426]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## [451]  0  0  0  0  0  0  0

Samples with the SJ of interest:

table(GeneSJ$chr21_43094564_43094666>0) 
## 
## FALSE  TRUE 
##   416    41

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr21_43094564_43094666 > 0])
## 
## MUT  WT 
##   3  38

Alternative SJ found in the mutated samples.

6.1.2 Exon 6 Skipping

Search: chr21:43094564-43095437

colnames(GeneSJ)[grep("43094564",colnames(GeneSJ))]
## [1] "chr21_43094564_43094654" "chr21_43094564_43094666"
## [3] "chr21_43094564_43094670"
colnames(GeneSJ)[grep("43095437",colnames(GeneSJ))]
## [1] "chr21_43094723_43095437" "chr21_43094789_43095437"
## [3] "chr21_43094825_43095437" "chr21_43095002_43095437"
## [5] "chr21_43095024_43095437" "chr21_43095051_43095437"

Show all the splice junctions containing the position 43094564_43095437

colnames(GeneSJ)[grep("43094564_43095437",colnames(GeneSJ))]
## character(0)

Alternative SJ not found in the splice junction collection.

6.1.3 Exon 5 Skipping

Search: chr21:43094789-43095693

Show all the splice junctions containing the position chr21:43094789-43095693

colnames(GeneSJ)[grep("43094789_43095693",colnames(GeneSJ))]
## [1] "chr21_43094789_43095693"

Found: chr21:43094789-43095693

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr21_43094789_43095693
##   [1]   0  84  16 669 134  13  13  14  30  59   7   7 154 377   8  29  25  12
##  [19] 349  18 223  14  53 832 313  88 112  22 248  17  10 166  11  13  31  97
##  [37]  22 166  62 313  13  33  25   2  96 226  57 349  28   8  51 296  38   0
##  [55]  99  10 269 380  72  11 144 218  16 103  10 114  15  49   0  33  16  55
##  [73] 276  28  54  30 446  29 357   9  25   4 557  14 128  14 168  15 124  20
##  [91]  33  44 430 216 112  14  10  17   4  13  40   0   0  66  14  11  83  74
## [109]  16  59  81  10 229  11  47  16   9  10  36  36 141 282  45  42  14 229
## [127] 118  16  14 134  48  24  11  22  54  10  82 113  30 162 164  13 485  75
## [145]   4 174   8 364  14  11   8 267 114  29  23  37  64  25  18  65   8 114
## [163]  16 107  41  12 191  98   9  12 340  12  47 251   6 355  33  26  15  27
## [181]   0  27  24  45 229  35  38 136   0  58  17 148 266  28  19 114  17   7
## [199]  10   0   6 339 177 536  20  25  11  29  25 188  37  66 121  15 513 237
## [217] 131  17  16 176 208 103  30 164   8 147  44  40   3  14  14  92 338  68
## [235]  24  20 359  45   0  14  19  99  16  11  15 585 169   0   5  53 178 195
## [253]  11  14  10  34 137  17  26   2  17  32  26  11  12 302  18  21  54  35
## [271] 306  64  44   8  48   3  18   6  61  10  16  53  98 249 143  12 315 110
## [289]  24 114   8 210 216  74  26   7  84   0  69  29  17 149 167  26  19  13
## [307]  70 176  63 160  16   0  41  17  49   0  96  40 147 159 127  17 162  54
## [325]  28  10  33  43  24  75  28  82 220  40 114  35   0  12  76   3 103  11
## [343]  24  47  18   7  35   9  26  46 182  45 191 144 243 186 123  79  73  81
## [361]  15 319 114 266 189  15   6 319 168   7  24  87  30 344 324  77  28 153
## [379]  12  12   0  46  46 158 276   8 175   9   8  31  26   7   8 168  96  15
## [397]  13  21  16  11  10  92  29  34  36   4  30 153  62  22  11  81  11 117
## [415]  11 201  46  20  12 209  24  25 334 219  19  83 198  21  15   0  15  12
## [433]   2  25  19 263  37 174   0 297 177  55  25  46  84  17  35 265   7  12
## [451]  14  31  13  33  65 144   9

Samples with the SJ of interest:

table(GeneSJ$chr21_43094789_43095693>0) 
## 
## FALSE  TRUE 
##    17   440

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr21_43094789_43095693 > 0])
## 
## MUT  WT 
##   3 437

Alternative SJ found in the mutated samples.

6.1.4 Canonical SJ

Exon 4-5: chr21:43095537-43095693

Exon 5-6: chr21:43094789-43095437

Exon 6-7: chr21:43094564-43094654; splice site chr21:43094554

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr21_43095537_43095693 #SJ Intron 4 to 5
##   [1] 1253  455 1421  125  404 1927 2803 2099  639 1452 1414 1220  627 1164  970
##  [16] 1107 1020 1872 1023  882  522 1768  664  569  582  888 1072  263  718  947
##  [31] 2420  698 1159  712  691  806  636  782  671 1239 2395 1832 1599 1156  690
##  [46] 1631  927  485  697  366  792  290  380 2040  778 1613  542 1363 1472 1722
##  [61]  777  226  608 1225  617  551 1298  872  950 1134 2314  440  419 1599 1182
##  [76] 1423  901 2931  494 1400 2391  783  363 1523  830 1335  517 1352  579 1143
##  [91] 2455 1728 1625 1237  602 1636  899 1266  690 1533 1065 1303 1050  449 1162
## [106]  904 1324  738  284  690 2418  892  900 3056  774 1590 2807  914  843 1020
## [121]  575 1287  780 1306 2776  664  738  815  909  756  809  780  769 1525  508
## [136]  890  466  238 1610  190  120 1258  882  669 1260  591  485  672 2135 1219
## [151] 1035 1591  822 1050 1698 1010 2292 1044  346 1105  908  709  649  709 2103
## [166]  832  617  564  709  673  281 1090  885  506  908 1241 2369 1407 1239 1253
## [181] 1214 1234 2230  418  484  676 1343 1301  877  718 1044  513  895  669 1952
## [196]  172  819  734 1955  650  866  554  903  487  884  790 1810 2178  929  470
## [211]  821  734  208 1674  267  368  330 1368  983  708  428 1546 1134  891 1279
## [226]  332 1873  444  893 1537 2852 1211 1169  829 1658  890  402 1443  462 1370
## [241]  894  602 1034 1414  941  308 1437  449  464  705  388  771 1147 1532  647
## [256]  913  635  853 2211  927 1050  588  627 1316 1203  421 1350 1421  599 1050
## [271] 2353  610 1068 1406  985  590 2211  779  813 1488 1230 1076  595 1079  461
## [286]  635  284  233  482  317 1272 1028  659  605  738 1312  338 1250  659  888
## [301] 1217 1738  396  559  979 1185  292  240  554  277 1398  545  879  723 1054
## [316]  768  478  829 1441  717 1308 1478  489  574 1632 1755  446 1018 1841  716
## [331] 1216  716  859  843  758  523 1483 1014  906  852  692  551  778 2371 2228
## [346] 1344  596 1800  777 1009  228 1204 1258 1290 1161  523  285  662 1186  777
## [361] 1722  259 1522  719  707  738  807 1336  968 1159 1192 2565 1007  535  882
## [376] 1146  892 1590 1208  962   19  820  628  763  256 1638 1275  897  813  467
## [391] 1216 1350 1226  243  657  771  492 1411 1246 1034  808  891  452  934  507
## [406] 2559  895  735 1736 2095  972  446  529  354 1223  454 1498 1293 1135  731
## [421]  840  881 1582  918  567 1210   95  978 1160 1458 1021  892   65 1275 1270
## [436]  744 1214  981  948  245  385  782 1432 1294  836  555  655  453 1384 1467
## [451]  836 1374 1308 1047  547  616 1039

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr21_43094789_43095437 #SJ Intron 5 to 6
##   [1]  652  340  757  224  196  677  721  788  396  439  832  694  413  387  308
##  [16]  590  888  875  678  530  169  715  423  569  548  419  629  244  481  878
##  [31]  666  464  769  426  284  560  415  378  471  520  628 1060 1428  709  515
##  [46]  882  427  284  290  282  475  142  236  642  350  487  336 1377  758 1076
##  [61]  402   93  469  445  395  383  501  579  656  728 1127  336  361  858  801
##  [76]  704  388 1290  306  773  606  432  231  892  445  433  227  970  568  440
##  [91]  714  493  745  770  295  720  643  817  385  619  528  386  817  318  423
## [106]  869  829  442  178  319  987  428  401  808  573  858  702  836  322  754
## [121]  275 1259  493  574 1144  380  500  435  443  399  453  365  432  425  287
## [136]  579  288  104  948  145   63  684  390  466  792  237  272  363  994  491
## [151]  361 1077  476  662  662  541 1163  512  193  433  501  689  490  300  728
## [166]  856  287  305  474  426  260  607  232  302  373 1236  685  723  687  866
## [181]  470  916  745  244  207  429  393  767  556  370  342  246  422  475  993
## [196]  181  446  516  945  637  515  391  447  492  504  510  650 1170  567  526
## [211]  266  677  140  468  330  408  252  685  556  523  255 1003  486  871  584
## [226]  328  730  304  563  625 1838  477  657  598  581  386  372  878  312  533
## [241]  461  633  487  408  435  226 1046  374  248  329  229  468  698  827  388
## [256]  971  354  828  828  627  572  339  362  669  438  258  545  980  406  673
## [271] 1611  389  682  407  487  496  630  512  480  473  648  763  613  465  327
## [286]  558  295  113  199  248  516  572  496  427  414 1110  274  748  428  473
## [301] 1039  716  249  255  499  649  188  153  420  197  739  240  431  499 1029
## [316]  537  346  425  993  390  695  606  317  371  869  631  263  390  655  360
## [331] 1161  526  422  492  751  347  640  676  483  570  487  306  382 1242 1710
## [346]  446  534  513  306  728  211  530 1384  627  766  301  172  626  582  301
## [361]  518  134  773  406  521  513  533  682  292 1074 1049 1059  684  366  388
## [376]  632  483  840  636  687    9  577  420  459  117  601  784  593  576  322
## [391]  712  679  450  153  428  508  348  603  771  541  766  243  269  609  304
## [406]  837  592  319  524  727  817  302  302  223  745  416  399  386  332  206
## [421]  271  727  793  516  284  491   77  407  565  779  378  566   80  634  330
## [436]  773  711  605  631  273  225  625  958  628  762  361  459  456 1132  429
## [451]  372  669  561  720  214  320  372

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr21_43094564_43094654 #SJ Intron 6 to 7
##   [1]  576  706  653 1638  470  707  754  754  378  587  763  594  594  926  343
##  [16]  559  793  756  993  529  463  715  487 1431  897  535  713  295  780  723
##  [31]  713  636  692  425  350  759  419  584  534  785  582  953 1152  676  701
##  [46] 1065  494  707  372  275  679  523  288  802  491  511  732 1475  820  916
##  [61]  702  523  380  719  364  549  479  641  558  653  912  412  680  798  899
##  [76]  674  952 1151  762  689  693  392  829  802  685  480  606  873  739  381
##  [91]  812  732 1207 1003  516  632  575  709  389  611  679  465  712  455  426
## [106]  770  978  525  178  349  935  400  686  802  553  782  741  727  433  664
## [121]  527 1346  549  562 1131  774  653  377  433  642  444  400  399  496  336
## [136]  518  434  318  877  368  345  624 1055  571  631  502  286  805  993  532
## [151]  347 1334  677  637  618  519 1025  515  259  587  456  784  398  546  816
## [166]  735  494  497  463  470  597  612  351  628  367 1422  781  642  697  881
## [181]  452  783  752  361  596  449  471  984  519  501  478  530  761  525  883
## [196]  432  446  459  780  561  499  718  700  991  486  509  708 1036  557  682
## [211]  444  741  369  460  925  634  337  641  571  719  599 1152  511  868  614
## [226]  501  748  378  513  572 1671  724 1036  707  733  430  994  740  293  505
## [241]  428  698  463  428  415  827 1392  341  236  496  583  648  641  707  387
## [256]  896  474  718  817  587  545  410  442  666  493  676  545  882  501  720
## [271] 2183  531  613  428  470  432  682  432  563  482  616  796  717  699  627
## [286]  497  586  346  259  324  531  586  730  441  422  836  344  682  506  408
## [301]  835  986  434  258  489  560  296  407  556  349  605  246  505  570  861
## [316]  487  442  512 1092  666  825  563  576  420  844  722  201  438  761  419
## [331] 1003  589  684  525  728  358  598  638  559  483  747  312  300 1236 1460
## [346]  491  562  516  423  642  532  599 1364  833 1179  640  494  756  687  497
## [361]  551  540  867  730  694  450  470  999  585  816  911 1155  706  756  842
## [376]  707  462  941  611  597   15  665  443  764  594  620  955  538  523  297
## [391]  769  564  474  560  511  469  328  528  697  421  648  509  418  630  401
## [406]  814  647  525  639  746  688  418  253  418  727  680  547  425  348  541
## [421]  445  619 1090  838  295  560  317  460  557  688  459  551   70  614  361
## [436]  915  724  840  498  594  536  683  761  698  791  409  572  803  915  446
## [451]  364  610  542  728  428  449  397

6.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonEx4_5 <- (GeneSJ$chr21_43095537_43095693)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_CanonEx5_6 <- (GeneSJ$chr21_43094789_43095437)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_CanonEx6_7 <- (GeneSJ$chr21_43094564_43094654)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_Ex6DG <- (GeneSJ$chr21_43094564_43094666)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_ES5 <- (GeneSJ$chr21_43094789_43095693)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

6.3 VAF

Mutated samples vaf:

6.4 Plots

6.4.1 Static Dot Plots

Canonical SJ:

Splicing alterations:

6.4.2 Interactive Dot Plots

Canonical SJ:

Splicing alteration:

6.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

6.5 Statistical Analysis

SJCounts <- GeneSJ 

6.5.1 Donor Gain

6.5.1.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_Ex6DG[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_Ex6DG[SJCounts$GROUP == "WT"]
## W = 0.3147, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_Ex6DG[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.0007982023

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_Ex6DG[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.08409014 0.11040236 0.04923441

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_Ex6DG - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 0.08329194 0.10960415 0.04843620

6.5.1.2 ECDF

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:39] = -0.0007982, 0.0034493, 0.0041878,  ..., 0.016684, 0.017392
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 1 1 1
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_Ex6DG")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1- MUT_df$ECDF
MUT_df$Prediction <- "Donor Gain"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr21:43094564-43094666"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

6.5.1.3 T-test

Normality:

shapiro.test(SJCounts$Normalized_Ex6DG[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_Ex6DG[SJCounts$GROUP == "WT"]
## W = 0.3147, p-value < 2.2e-16
shapiro.test(SJCounts$Normalized_Ex6DG[SJCounts$GROUP == "MUT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_Ex6DG[SJCounts$GROUP == "MUT"]
## W = 0.99354, p-value = 0.8463

Variances:

res.ftest <- var.test(Normalized_Ex6DG ~ GROUP, SJCounts, alternative = "two.sided", conf.level=0.95)
res.ftest
## 
##  F test to compare two variances
## 
## data:  Normalized_Ex6DG by GROUP
## F = 119.25, num df = 2, denom df = 453, p-value < 2.2e-16
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##    32.06533 4710.00312
## sample estimates:
## ratio of variances 
##           119.2536

Welch Two Sample t-test:

t <- t.test(Normalized_Ex6DG ~ GROUP, data=SJCounts,
 alternative="two.sided")
t
## 
##  Welch Two Sample t-test
## 
## data:  Normalized_Ex6DG by GROUP
## t = 4.5409, df = 2.0002, p-value = 0.04522
## alternative hypothesis: true difference in means between group MUT and group WT is not equal to 0
## 95 percent confidence interval:
##  0.004228648 0.156659550
## sample estimates:
## mean in group MUT  mean in group WT 
##      0.0812423016      0.0007982023

6.5.2 Canonical Donor Loss

6.5.2.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonEx6_7[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonEx6_7[SJCounts$GROUP == "WT"]
## W = 0.90443, p-value = 2.792e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonEx6_7[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 5.395745

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonEx6_7[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 4.473596 4.882237 3.702427

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonEx6_7 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -0.9221498 -0.5135080 -1.6933182

6.5.2.2 ECDF

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:454] = -2.4939, -2.4202, -2.2311,  ..., 5.4026, 10.996
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.22687225 0.37665198 0.05506608
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonEx6_7")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- NA
MUT_df$Prediction <- "Donor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr21:43094564-43094654"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

6.5.2.3 T-test

Normality:

shapiro.test(SJCounts$Normalized_CanonEx6_7[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonEx6_7[SJCounts$GROUP == "WT"]
## W = 0.90443, p-value = 2.792e-16
shapiro.test(SJCounts$Normalized_CanonEx6_7[SJCounts$GROUP == "MUT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonEx6_7[SJCounts$GROUP == "MUT"]
## W = 0.96949, p-value = 0.6647

Variances:

res.ftest <- var.test(Normalized_CanonEx6_7 ~ GROUP, SJCounts, alternative = "two.sided", conf.level=0.95)
res.ftest
## 
##  F test to compare two variances
## 
## data:  Normalized_CanonEx6_7 by GROUP
## F = 0.21711, num df = 2, denom df = 453, p-value = 0.3902
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.05837787 8.57499090
## sample estimates:
## ratio of variances 
##          0.2171121

Two Sample t-test:

t <- t.test(Normalized_CanonEx6_7 ~ GROUP, data=SJCounts,
 alternative="two.sided",
 var.equal=TRUE)
t
## 
##  Two Sample t-test
## 
## data:  Normalized_CanonEx6_7 by GROUP
## t = -1.4028, df = 455, p-value = 0.1614
## alternative hypothesis: true difference in means between group MUT and group WT is not equal to 0
## 95 percent confidence interval:
##  -2.5041422  0.4181582
## sample estimates:
## mean in group MUT  mean in group WT 
##          4.352753          5.395745

6.5.3 Exon 5 Skipping

6.5.3.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_ES5[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_ES5[SJCounts$GROUP == "WT"]
## W = 0.74617, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_ES5[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.7802697

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_ES5[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.09249916 0.19627085 0.11816257

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_ES5 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -0.6877706 -0.5839989 -0.6621072

6.5.3.2 ECDF

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:438] = -0.78027, -0.7648, -0.76276,  ..., 3.8119, 5.9144
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.1806167 0.4008811 0.2621145
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_ES5")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- NA

MUT_df$Prediction <- "Exon Skipping"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr21:43094789-43095693"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

6.5.3.3 T-test

Normality:

shapiro.test(SJCounts$Normalized_ES5[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_ES5[SJCounts$GROUP == "WT"]
## W = 0.74617, p-value < 2.2e-16
shapiro.test(SJCounts$Normalized_ES5[SJCounts$GROUP == "MUT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_ES5[SJCounts$GROUP == "MUT"]
## W = 0.92154, p-value = 0.4578

Variances:

res.ftest <- var.test(Normalized_ES5 ~ GROUP, SJCounts, alternative = "two.sided", conf.level=0.95)
res.ftest
## 
##  F test to compare two variances
## 
## data:  Normalized_ES5 by GROUP
## F = 0.0029752, num df = 2, denom df = 453, p-value = 0.005941
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.0007999698 0.1175057212
## sample estimates:
## ratio of variances 
##        0.002975154

Welch Two Sample t-test:

t <- t.test(Normalized_ES5 ~ GROUP, data=SJCounts,
 alternative="less",
 var.equal=FALSE)
t
## 
##  Welch Two Sample t-test
## 
## data:  Normalized_ES5 by GROUP
## t = -11.51, df = 20.308, p-value = 1.176e-10
## alternative hypothesis: true difference in means between group MUT and group WT is less than 0
## 95 percent confidence interval:
##        -Inf -0.5481032
## sample estimates:
## mean in group MUT  mean in group WT 
##         0.1356442         0.7802697

6.5.4 Canonical SJ Intron 4-5

6.5.4.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "WT"]
## W = 0.96736, p-value = 1.629e-08

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 8.08144

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 10.250589  7.961237 10.979272

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonEx4_5 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1]  2.1691485 -0.1202036  2.8978322

6.5.4.2 ECDF

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:454] = -6.8306, -6.0433, -5.7787,  ..., 3.8771, 3.9066
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.8678414 0.4030837 0.9405286
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonEx4_5")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- NA

MUT_df$Prediction <- "Canonical Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr21:43095537-43095693"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

`

6.5.4.3 T-test

Normality:

shapiro.test(SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "WT"]
## W = 0.96736, p-value = 1.629e-08
shapiro.test(SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "MUT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonEx4_5[SJCounts$GROUP == "MUT"]
## W = 0.91816, p-value = 0.4459

Variances:

res.ftest <- var.test(Normalized_CanonEx4_5 ~ GROUP, SJCounts, alternative = "two.sided", conf.level=0.95)
res.ftest
## 
##  F test to compare two variances
## 
## data:  Normalized_CanonEx4_5 by GROUP
## F = 0.57596, num df = 2, denom df = 453, p-value = 0.8748
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##   0.1548666 22.7480037
## sample estimates:
## ratio of variances 
##          0.5759618

Two Sample t-test:

t <- t.test(Normalized_CanonEx4_5 ~ GROUP, data=SJCounts,
 alternative="two.sided",
 var.equal=TRUE)
t
## 
##  Two Sample t-test
## 
## data:  Normalized_CanonEx4_5 by GROUP
## t = 1.3731, df = 455, p-value = 0.1704
## alternative hypothesis: true difference in means between group MUT and group WT is not equal to 0
## 95 percent confidence interval:
##  -0.7110516  4.0089030
## sample estimates:
## mean in group MUT  mean in group WT 
##          9.730366          8.081440

6.5.5 Canonical SJ Intron 5-6

6.5.5.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonEx5_6[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonEx5_6[SJCounts$GROUP == "WT"]
## W = 0.99126, p-value = 0.008802

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonEx5_6[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 4.531591

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonEx5_6[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 4.128826 6.010795 3.667963

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonEx5_6 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -0.4027648  1.4792040 -0.8636277

6.5.5.2 ECDF

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:454] = -3.298, -3.2718, -3.066,  ...,  4.387, 4.6644
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.3854626 0.9030837 0.2665198
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonEx5_6")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- NA

MUT_df$Prediction <- "Canonical Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr21:43094789-43095437"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

6.5.5.3 T-test

Normality:

shapiro.test(SJCounts$Normalized_CanonEx5_6[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonEx5_6[SJCounts$GROUP == "WT"]
## W = 0.99126, p-value = 0.008802
shapiro.test(SJCounts$Normalized_CanonEx5_6[SJCounts$GROUP == "MUT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonEx5_6[SJCounts$GROUP == "MUT"]
## W = 0.89075, p-value = 0.3566

Variances:

res.ftest <- var.test(Normalized_CanonEx5_6 ~ GROUP, SJCounts, alternative = "two.sided", conf.level=0.95)
res.ftest
## 
##  F test to compare two variances
## 
## data:  Normalized_CanonEx5_6 by GROUP
## F = 0.88354, num df = 2, denom df = 453, p-value = 0.8281
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##   0.2375683 34.8958559
## sample estimates:
## ratio of variances 
##           0.883536

Two Sample t-test:

t <- t.test(Normalized_CanonEx5_6 ~ GROUP, data=SJCounts,
 alternative="two.sided",
 var.equal=TRUE)
t
## 
##  Two Sample t-test
## 
## data:  Normalized_CanonEx5_6 by GROUP
## t = 0.092767, df = 455, p-value = 0.9261
## alternative hypothesis: true difference in means between group MUT and group WT is not equal to 0
## 95 percent confidence interval:
##  -1.431801  1.573676
## sample estimates:
## mean in group MUT  mean in group WT 
##          4.602528          4.531591

7 WT1 chr11,32396364,G,T

Variant found in 3 patients of the BeatAML (3 samples)

  • Patients with WT1 chr11,32396364,G,T variant: 3 patients (3 samples)
  • Patients with the variant and RNASeq for validation: 2 patients (2 samples)

The splicing alterations being assessed are:

  • Aceptor Loss: chr11:32396408-32399947; aceptor splice site chr11:32396408.
  • Donor Gain: predicted at 3bp from the variant position, chr11,32396366, not found in the splice junction collection.
  • Exon Skipping: chr11:32392756-32399947, found in the mutated samples.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"WT1_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="WT1" & found_variants$MutationKey_Hg38 == "chr11,32396364,G,T",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

7.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

7.1.1 Donor Gain

Search: predicted at 3bp from the variant position 32396364, chr11,32396366

Show all the splice junctions containing positions between 32396360 - 32396369

colnames(GeneSJ)[grep("3239636",colnames(GeneSJ))]
## character(0)

Alternative SJ not found in the splice junction collection.

7.1.2 Exon Skipping

Search: chr11:32392756-32399947

Show all the splice junctions containing the position 32392756-32399947

colnames(GeneSJ)[grep("32392756_32399947",colnames(GeneSJ))]
## [1] "chr11_32392756_32399947"

Found: chr11_32392756_32399947

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr11_32392756_32399947
##   [1] 0 5 1 0 0 2 0 2 0 0 1 0 2 0 1 1 4 1 1 0 1 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1
##  [38] 0 0 0 0 2 0 0 2 0 0 0 0 0 0 0 1 0 0 0 0 1 4 0 1 0 0 0 0 2 0 0 0 0 0 8 0 1
##  [75] 0 0 2 0 1 0 2 0 1 8 0 1 0 1 0 0 1 1 0 3 2 0 0 0 0 0 2 0 0 0 0 0 2 0 0 0 0
## [112] 0 1 0 0 1 0 1 0 0 0 0 0 0 2 0 0 0 0 2 0 1 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0
## [149] 1 0 0 0 2 2 0 1 4 1 0 0 0 1 0 1 3 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 2 2 0 0 0
## [186] 0 0 0 1 0 0 0 1 0 1 0 1 0 2 3 1 0 0 0 0 0 1 3 0 0 3 0 0 0 1 0 0 0 3 0 0 5
## [223] 0 0 3 0 0 0 0 0 0 3 0 0 1 0 0 0 0 4 0 0 1 1 0 0 4 0 0 1 5 0 0 0 0 1 0 0 0
## [260] 1 0 0 0 1 0 7 0 2 0 0 0 0 0 0 0 1 1 0 0 0 2 0 1 2 5 0 4 0 0 0 0 0 0 0 0 0
## [297] 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 2 1 3 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 5 1 0
## [334] 0 6 0 0 1 1 0 0 0 0 2 0 0 2 1 1 0 2 0 0 1 0 0 0 0 1 2 0 0 0 0 0 0 1 1 1 0
## [371] 0 0 0 0 3 0 0 0 0 0 0 3 0 2 0 0 0 0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 1 0 1 0 0
## [408] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 8 0 1 2 0 0 2 1 1 0 0 0 0 2 1 1 0 0 4 0 1 0
## [445] 0 0 0 0 2 0 0 1 1 0 2 0 0

Samples with the SJ of interest:

table(GeneSJ$chr11_32392756_32399947>0) 
## 
## FALSE  TRUE 
##   313   144

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr11_32392756_32399947 > 0])
## 
## MUT  WT 
##   1 143

Alternative SJ found in the mutated samples.

7.1.3 Canonical SJ

Exon 7-6: chr11:32396408-32399947; aceptor splice site: chr11:32396408

Exon 8-7: chr11:32392756-32396256

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr11_32396408_32399947
##   [1]  66  47   7   3   2  20 106 161   0  23   9   0 101  10  84  47 101 120
##  [19]  57  30  55  50  30  43   0  69  16   2   0  79   1  64   0  36   0  25
##  [37]  20   6   0   3   0 151   0 110  16   1   5   9  14   1  64   0   6  63
##  [55]   4   6   1  54  63  58  42   3   0   9   0 120  13   3  12   0   6  14
##  [73]  82 134  45   9  32   3  51  19 162  11  10 172   0  29  60  68   0   3
##  [91]  51  69   4 262  26   0  67 102   6 105  98  51   1  22 120  16  68  20
## [109]  18   0   5  28  63  30   0  80 183 103   1  14   2   3  39   6  35   6
## [127]   6   0   0  80   3  30   2  50   7  15  44   1  49   2   0   0  62   3
## [145]  10   1   3  42  78  47   0   0  96 100   2  57 169  33  10  16  12  58
## [163]   2  68  68   0   0  40  43   2   0   6  28  29  22   2   0  54   7   1
## [181]  50 138  44   0   0   2   0   0 107   6   1   6  10  39 122   5 103  49
## [199]  75  94 161   1   2  51   0  23 213 159  21  17  51   0  13   5   4   0
## [217]   6 103  72  86   0  66   3  31  95   8   0   0  34   1   3 144   6   0
## [235]  68  23   0 104  10  32   7   2  53  55  52  42  89   2   4  97  60  22
## [253]  51   7  41   8   0   0  77  22  37  13   0 103   0 124 203 153  30  55
## [271]  13   0   3   0  11  17  20  21   4  30 114  15  24  46  22   0  22   1
## [289]   0  12  81 103   0  15   0  55  19   8   9  10  54  15  10   0  30   8
## [307]   0  27  20   0   5  34   7  27  16   3  59   4  10  83   0   7   4  29
## [325]  88 163   2  12  29   0 107  73  76   1 110  31  33 113  35   0   0   0
## [343]  30 120  25  61  28  73  54  55  49  29  41 128   5   0  18   7  73  53
## [361]   7   3   0  18   0   9  55  17  51  35   5   2   0   1  72   0   3  31
## [379]   1 101   5  49   0  37   0 124   7   9  25   9 197  42  22   0   4 107
## [397]   0  10   0   0  77   0   6  10  47  43  10  17  34  23   3   3   2   5
## [415]  14   3   7  12   0  28  25  27 324  39  20  51   0   7 111  57 216  24
## [433]   0  12   0  56  27  41  87   0 122  82  52  17  12   7   0   5 156  56
## [451]  30  66  78  12  80   1   0

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr11_32392756_32396256
##   [1]  61  55   2   2   0  34  83 151   0  16  11   0 103   7  59  47  58  92
##  [19]  49  26  68  63  27  49   2  62  16   3   1  68   4  61   0  41   0  47
##  [37]  27   4   0   4   0  99   0  96  13   0   1   7  16   0  59   0   5  30
##  [55]   8   9   0  47  47  39  29   4   1   3   0 100   3   2  21   0   9  38
##  [73] 102  82  41   9  31   8  48  17 117  11  12 113   0  21  58  54   0   4
##  [91]  33  71   4 222  41   0  68 105  15  98  73  50   0  47  86  22 109  14
## [109]  12   0   2  17  75  28   1  83 143  99   1  18   5   4  28   7  20   1
## [127]   6   0   0  71   1  32   3  45   2   8  32   3  47   3   0   6  74   2
## [145]  13   1   2  41  79  30   0   0 100 130   4  62 168  50   3  11  12  73
## [163]   1  85  66   0   0  60  30   4   0   5  41  28   9   2   4  60   4   3
## [181]  68 126  25   0   0   1   0   0  89  10   4   4  14  33  99   5  71  49
## [199]  39  66 119   0   7  38   0  15 127 122  11  13  43   0  11   1   1   2
## [217]   7  79  65  76   0  69   2  33  99   9   0   0  25   2   0 168   2   0
## [235]  52  18   0 129   7  44   9   3  69  52  55  45  98   0   1  81  77  25
## [253]  42   3  47   4   0   2  75  19  34  17   1  93   0 113 138 124  33  66
## [271]  11   4   2   0  10  12  17  21   0  15  72  23  23  72  25   1  34   1
## [289]   0  18  73 136   0   9   0  46  26  33  22  17  57  25   7   0  24  14
## [307]   0  27  21   0   4  31   5  27  27   2  57   0   7 100   2   3   8  27
## [325]  53  93   0  16  26   0  97  81  82   3  99  27  30  95  37   0   4   0
## [343]  26 117  24  55  21  59  44  50  73  31  30 149   5   0  17   7  75  40
## [361]  18   5   0  11   0   6  61  23  46  31   7   2   0   1  71   0   8  26
## [379]   0  84   2  42   3  41   0 105   4   4  33   6 174  31  21   1   1  99
## [397]   0  11   0   0  58   0  21   9  43  44  10  18  27  22   2  11   4  13
## [415]  11   5   9  14   0  29  23  12 374  35  28  79   0   0  94  35 120  32
## [433]   0  13   1  41  22  53  76   0 141  90  44   7  85   2   0   3 134  47
## [451]  25  72  87  30  93   2   0

7.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonEx6_7 <- (GeneSJ$chr11_32396408_32399947)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_CanonEx7_8 <- (GeneSJ$chr11_32392756_32396256)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_ES7 <- (GeneSJ$chr11_32392756_32399947)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

7.3 VAF

Mutated samples vaf:

7.4 Plots

7.4.1 Static Dot Plots

Canonical SJ:

Splicing alterations:

7.4.2 Interactive Dot Plots

Canonical splice junction: Exon 6-7: chr11:32396408-32399947; aceptor splice site: chr11:32396408

Splicing alteration:

7.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

7.5 Statistical Analysis

SJCounts <- GeneSJ

7.5.1 Aceptor Loss

7.5.1.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonEx6_7[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonEx6_7[SJCounts$GROUP == "WT"]
## W = 0.67926, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonEx6_7[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 10.21563

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonEx6_7[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 8.379888

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonEx6_7 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -1.835745

7.5.1.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:322] = -10.216, -8.1954, -8.0847,  ..., 29.784, 89.784
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.2873832
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonEx6_7")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- MUT_df$ECDF

MUT_df$Prediction <- "Aceptor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr11:32396408-32399947"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

7.5.2 Exon Skipping

7.5.2.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_ES7[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_ES7[SJCounts$GROUP == "WT"]
## W = 0.43233, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_ES7[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.1515082

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_ES7[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.2793296

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_ES7 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 0.1278214

7.5.2.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:139] = -0.15151, -0.09786, -0.088257,  ..., 3.1818, 4.0152
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.8481308
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_ES7")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1- MUT_df$ECDF

MUT_df$Prediction <- "Exon Skipping"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr11:32392756-32399947"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

8 WT1 chr11,32392696,G,A

Variant found in 1 patient of the BeatAML (1 sample)

  • Patients with WT1 chr11,32392696,G,A variant: 1 patient (1 sample)
  • Patients with the variant and RNASeq for validation: 1 patient (1 sample)

The splicing alterations being assessed are:

  • Donor Loss: chr11:32392065-32392665; donor splice site chr11:32392665.
  • Exon Skipping: chr11:32392065-32396256, found in the mutated samples.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"WT1_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="WT1" & found_variants$MutationKey_Hg38 == "chr11,32392696,G,A",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

8.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

8.1.1 Exon Skipping

Search: chr11:32392065-32396256

Show all the splice junctions containing the position chr11:32392065-32396256

colnames(GeneSJ)[grep("32392065_32396256",colnames(GeneSJ))]
## [1] "chr11_32392065_32396256"

Found: chr11_32392065_32396256

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr11_32392065_32396256
##   [1]  0  1  0  0  1  0  1  1  0  0  0  0  3  1  0  0  1  1  1  1  1  2  1  1  0
##  [26]  0  0  0  0  0  0  1  0  0  0  1  0  0  0  0  0  2  0  4  1  0  0  0  0  0
##  [51]  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  2  0  0  0  0  0  0  1  1  0
##  [76]  1  0  0  0  0  1  0  0  0  0  1  0  2  0  0  1  3  0  4  0  0  0  0  0  0
## [101]  1  0  0  0  0  0  0  0  0  0  0  0  2  0  0  0  0  1  0  0  0  0  1  0  0
## [126]  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0
## [151]  0  0  0  0  0  2  2  1  0  0  0  0  0  2  1  0  0  0  0  0  0  0  0  0  0
## [176]  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 15  0
## [201]  1  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  1  0  1  0  2  0
## [226]  0  0  0  2  0  0  0  0  0  2  0  0  0  0  0  0  0  0  1  0  0  0  0  0  1
## [251]  1  0  0  0  0  0  0  0  0  0  0  0  0  1  0  1  2  4  0  0  0  0  0  0  0
## [276]  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  2  0  0  0  0  0  0  0  0
## [301]  0  1  0  0  1  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0
## [326]  1  0  0  0  0  0  0  1  0  2  0  1  0  2  0  0  0  0  1  0  0  1  0  0  0
## [351]  0  0  1  1  0  0  2  0  2  1  0  1  0  0  0  2  2  0  1  1  0  0  0  0  1
## [376]  0  0  0  0  1  0  0  0  0  0  1  0  0  0  0  2  0  0  0  0  1  0  0  0  0
## [401]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  1  0  0  4  0  0
## [426]  1  0  0  0  1  3  0  0  2  0  1  0  0  1  0  2  0  1  0  0  0  0  0  0  0
## [451]  0  0  1  0  4  0  0

Samples with the SJ of interest:

table(GeneSJ$chr11_32392065_32396256>0) 
## 
## FALSE  TRUE 
##   360    97

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr11_32392065_32396256 > 0])
## 
## MUT  WT 
##   1  96

Alternative SJ found in the mutated samples.

8.1.2 Canonical SJ

Exon upstream (UE): chr11:32392756-32396256

Exon downstream (DE): chr11:32392065-32392665; donor splice site chr11:32392665

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr11_32392756_32396256
##   [1]  61  55   2   2   0  34  83 151   0  16  11   0 103   7  59  47  58  92
##  [19]  49  26  68  63  27  49   2  62  16   3   1  68   4  61   0  41   0  47
##  [37]  27   4   0   4   0  99   0  96  13   0   1   7  16   0  59   0   5  30
##  [55]   8   9   0  47  47  39  29   4   1   3   0 100   3   2  21   0   9  38
##  [73] 102  82  41   9  31   8  48  17 117  11  12 113   0  21  58  54   0   4
##  [91]  33  71   4 222  41   0  68 105  15  98  73  50   0  47  86  22 109  14
## [109]  12   0   2  17  75  28   1  83 143  99   1  18   5   4  28   7  20   1
## [127]   6   0   0  71   1  32   3  45   2   8  32   3  47   3   0   6  74   2
## [145]  13   1   2  41  79  30   0   0 100 130   4  62 168  50   3  11  12  73
## [163]   1  85  66   0   0  60  30   4   0   5  41  28   9   2   4  60   4   3
## [181]  68 126  25   0   0   1   0   0  89  10   4   4  14  33  99   5  71  49
## [199]  39  66 119   0   7  38   0  15 127 122  11  13  43   0  11   1   1   2
## [217]   7  79  65  76   0  69   2  33  99   9   0   0  25   2   0 168   2   0
## [235]  52  18   0 129   7  44   9   3  69  52  55  45  98   0   1  81  77  25
## [253]  42   3  47   4   0   2  75  19  34  17   1  93   0 113 138 124  33  66
## [271]  11   4   2   0  10  12  17  21   0  15  72  23  23  72  25   1  34   1
## [289]   0  18  73 136   0   9   0  46  26  33  22  17  57  25   7   0  24  14
## [307]   0  27  21   0   4  31   5  27  27   2  57   0   7 100   2   3   8  27
## [325]  53  93   0  16  26   0  97  81  82   3  99  27  30  95  37   0   4   0
## [343]  26 117  24  55  21  59  44  50  73  31  30 149   5   0  17   7  75  40
## [361]  18   5   0  11   0   6  61  23  46  31   7   2   0   1  71   0   8  26
## [379]   0  84   2  42   3  41   0 105   4   4  33   6 174  31  21   1   1  99
## [397]   0  11   0   0  58   0  21   9  43  44  10  18  27  22   2  11   4  13
## [415]  11   5   9  14   0  29  23  12 374  35  28  79   0   0  94  35 120  32
## [433]   0  13   1  41  22  53  76   0 141  90  44   7  85   2   0   3 134  47
## [451]  25  72  87  30  93   2   0

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr11_32392065_32392665
##   [1]  74  69   7   4   0  60 123 187   0  19   9   0 133  14  78  47  77 109
##  [19]  53  50 131  80  52  47   0  75  20   0   3  86   8  71   0  37   0  47
##  [37]  30   8   0   7   2 143   0  88  19   0   1  10  17   0  85   1   4  43
##  [55]   5   9   1  55  95  51  65   8   1   8   0 120  13   2  22   0   2  77
##  [73] 104 134  45  14  44   8  56  35 211  14  31 182   1  49 107  66   0   7
##  [91]  52 108   4 344  80   0  63 109  10 132 132  63   0  64 129  29 134  23
## [109]  27   0   9  35  89  38   0  92 218 111   0  18   1   3  52  17  31   7
## [127]  11   0   1 116   4  46   3  85   3  18  61   2  22   2   0   7 110   2
## [145]  17   1   1  57 120  57   3   0 152 153   6  76 227  39   5  19  10  79
## [163]   2 136 100   0   0  90  34   4   0   7  60  52  13   2   3  81   8   6
## [181]  81 159  46   0   0   1   0   3  98   8   4   5  19  59 129   5  95  47
## [199]  74  88 136   2   8  52   0  21 206 150  22  19  61   0  10   1   3   0
## [217]   9 113  81 115   0 101  14  45  50   6   0   0  34   1   0 239   4   0
## [235] 104  27   0 168  13  41  12   5  62  85  58  57 149   1   2 122 100  26
## [253]  53   9  38   8   0   2 113  23  47  11   0 134   3 173 182 158  38  70
## [271]  17   2   1   0  12  14  27  19   3  26 101  28  32 116  31   0  60   5
## [289]   0  25  87 149   1   8   0  56  22  41  22  32  65  27  10   0  29  20
## [307]   0  33  25   0   4  42   3  41  14   2  74   2  11 174   5   4  13  39
## [325]  95 163   0  18  28   0 107 114 119   2 126  49  34 107  53   0   8   0
## [343]  32 141  28 105  19  70  80  46  68  41  56 201   4   1  28   9 109  61
## [361]  40   6   2  14   1   7  81  30  59  41   8   5   0   2 126   1   9  38
## [379]   0  75   0  54   4  62   0 130   3   2  32  11 233  44  28   1   2  90
## [397]   0  15   0   0  70   0  28  11  54  60  19  28  51  26   1   7   4   9
## [415]  18   4  21  21   2  52  44  24 484  51  36 134   0   9 143  68 216  34
## [433]   0  20   3  40  33  44  88   0 170  98  54  20 128   8   0   1 165  77
## [451]  27  66 122  33 146   1   0

8.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonDE <- (GeneSJ$chr11_32392065_32392665)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_ES <- (GeneSJ$chr11_32392065_32396256)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

8.3 VAF

Mutated samples vaf:

8.4 Plots

8.4.1 Static Dot Plots

Canonical splice junction Exon downstream (DE): chr11:32392065-32392665; donor splice site chr11:32392665

Splicing alterations:

8.4.2 Interactive Dot Plots

Canonical splice junction Exon downstream (DE): chr11:32392065-32392665, donor splice site: chr11:32392665

Splicing alteration:

8.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

8.5 Statistical Analysis

SJCounts <- GeneSJ

8.5.1 Donor Loss

8.5.1.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonDE[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonDE[SJCounts$GROUP == "WT"]
## W = 0.57436, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonDE[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 13.82591

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonDE[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 14.72471

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonDE - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 0.898805

8.5.1.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:328] = -13.826, -11.326, -10.493,  ..., 52.841, 86.174
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.6985981
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonDE")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- MUT_df$ECDF

MUT_df$Prediction <- "Donor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr11:32392065-32392665"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

8.5.2 Exon Skipping

8.5.2.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_ES[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_ES[SJCounts$GROUP == "WT"]
## W = 0.11859, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_ES[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.1147521

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_ES[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.128041

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_ES - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 0.01328883

8.5.2.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:95] = -0.11475, -0.048527, -0.039621,  ..., 3.0102, 14.171
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.8317757
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_ES")] 
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1- MUT_df$ECDF

MUT_df$Prediction <- "Exon Skipping"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr11:32392065-32396256"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

9 TP53 chr17,7675238,T,C

Variant found in 1 patient of the BeatAML (1 sample)

  • Patients with TP53 chr17,7675238,T,C variant: 1 patient (1 sample)
  • Patients with the variant and RNASeq for validation: 1 patient (1 sample)

The splicing alterations being assessed are:

  • Aceptor Loss: chr17:7675237-7675993; aceptor splice site chr17:7675237.
  • Aceptor Gain: chr17:7675216-7675993, found in the mutated samples.
  • Aceptor Gain: chr17:7675234:7676033, found in the splice junction collection, not found in the mutated samples.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"TP53_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="TP53" & found_variants$MutationKey_Hg38 == "chr17,7675238,T,C",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

9.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

9.1.1 Aceptor Gain

Search: chr17:7675216-7675993

Show all the splice junctions containing the position chr17:7675216-7675993

colnames(GeneSJ)[grep("7675216_7675993",colnames(GeneSJ))]
## [1] "chr17_7675216_7675993"

Found: chr17_7675216_7675993

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr17_7675216_7675993
##   [1]   1   0   0   0   0   2   3   1   0   0   3   1   2   0   1   2   0   4
##  [19]   0   0   0   1   0   0   1   0   2   0   1   3   0   2   3   0   0   0
##  [37]   1   1   1   0   2   0   2   4   3   1   0   0   1   1   0   0   3   3
##  [55]   0   1   0   0   4   0   0   0   0   0   1   0   1   0   0   1 226   0
##  [73]   1   2   0   1   1   7   0   2   4   0   0   3   0   3   0   3   1   1
##  [91]   5   3   1   0   0   1   3   2   2   1   1   1   2   0   2   4   0   0
## [109]   0   0   0   0   1   4   0   3   4   0   2   3   1   1   2   2   0   0
## [127]   0   0   1   1   0   0   0   2   0   1   2   0   2   0   0   1   0   0
## [145]   1   0   1   2   2   2   6   1   3   5   1   1   1   3   0   0   2   0
## [163]   2   2   5   0   0   0   0   0   0   1   2   0   0   0   6   1   2   3
## [181]   5   1   6   1   0   1   1   1   0   0   3   0   1   3   1   1   1   1
## [199]   2   3   3   0   0   1   1   0   0   1   1   0   2   1   0   2   0   2
## [217]   0   1   2   0   0   0   2   0   1   1   1   0   0   1   1   1   0   0
## [235]   0   0   0   0   0   4   3   1   0   0   1   0   5   0   1   0   0   4
## [253]   3   0   1   1   1   1   0   3   1   0   0   2   4   0   1   0   1   0
## [271]   3   1   2   1   1   0   1   4   1   3   1   1   0   0   1   1   0   1
## [289]   1   0   2   0   0   1   2   4   0   3   0   0   1   1   0   2   1   4
## [307]   0   0   0   0   3   0   0   1   0   0   1   0   1   0   1   1   0   0
## [325]   0   6   0   0   0   0   2   0   1   2   0   0   1   0   1   0   0   0
## [343]   1   6   0   4   0   1   1   2   0   1   0   0   0   0   0   0   6   1
## [361]   3   1   0   0   0   3   2   3   2   0   2   0   0   2   0   2   0   1
## [379]   1   4   0   2   0   2   0   5   0   2   4   1   0   2   6   0   0   2
## [397]   2   3   4   2   3   0   0   0   0   3   0   0   1   2   0   0   1   0
## [415]   3   0   0   1   1   0   0   1   1   1   0   0   0   0   0   3   5   1
## [433]   0   3   2   0   0   2   1   0   0   0   3   0   1   0   1   0   4   1
## [451]   2   1   1   1   0   0   0

Samples with the SJ of interest:

table(GeneSJ$chr17_7675216_7675993>0) 
## 
## FALSE  TRUE 
##   207   250

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr17_7675216_7675993 > 0])
## 
## MUT  WT 
##   1 249

Alternative SJ found in the mutated samples.

9.1.2 Aceptor Gain

Search: chr17:7675234-7676033

Show all the splice junctions containing the positions between 7675230 and 7675239

colnames(GeneSJ)[grep("767523",colnames(GeneSJ))]
##  [1] "chr17_7675234_7676033" "chr17_7675237_7675335" "chr17_7675237_7675349"
##  [4] "chr17_7675237_7675602" "chr17_7675237_7675884" "chr17_7675237_7675993"
##  [7] "chr17_7675237_7676193" "chr17_7675237_7676381" "chr17_7675237_7687376"
## [10] "chr17_7675239_7676001"

Found: chr17_7675234_7676033

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr17_7675234_7676033
##   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [260] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [371] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [408] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [445] 0 0 0 0 0 0 0 0 0 0 0 0 0

Samples with the SJ of interest:

table(GeneSJ$chr17_7675234_7676033>0) 
## 
## FALSE  TRUE 
##   456     1

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr17_7675234_7676033 > 0])
## 
## WT 
##  1

Alternative SJ not found in the mutated samples of the splice junction collection.

9.1.3 Canonical SJ

Canonical DE: chr17:7674972-7675052

Canonical UE: chr17:7675237-7675993; aceptor splice site chr17:7675237

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr17_7674972_7675052
##   [1] 141  24 108   9  10  90 187 146  72  53 165 160  39  11  90 106 169 255
##  [19]  43 175  11  89  88  15  75  39  41   2  31 212 177  49 216  88  17  35
##  [37] 122  27  47  37  88  32 219 139  67  93 115   7  57  46  42  13  39 113
##  [55]   3 109  12 149 183 139  34   4  78  37  69   6  35  65 183  59 217  55
##  [73]  20 148  94 125  25 295  31 160 151  91   8 185  33 116  16 161  61  50
##  [91] 136  74  41  49  29 136 158 140  89  96  63  94 158  26 100 317  96  60
## [109]   6  10 111  72  42  60  58 207 193 127  57 167  12 203 163 152  17  21
## [127]  16  66 111  58  60  49 116  62  59  99  35  11 183   8   8 163  12  36
## [145] 281  14  37  32 233 117  96 168  50 213  80  81 196  55  14   5  98  45
## [163]  63  30 170  76  73  52 153  49  39  71  32  11  76   7 155  93 123  79
## [181] 144  71 132  51  21  31  51  17 123  10  44  27  34  77 208  19 119  95
## [199] 120 138  80  22  11  39  91  31  70 131 111  77  47 135  14  99  33  27
## [217]  33 160 105  23  24  41 148  48 153  35  75  36 123  90  71  80  41  74
## [235] 123  68  17  68  62 161  93 100 144 115  82  19  93   0  82  19  18  32
## [253] 134 127  46 146  32 150 173 126 133  32  42 156  84  15 110 189  11  92
## [271] 252  55 154  82  60 100 104 144  28  55 165 110  88  62  25 121  33  10
## [289]  18  20 117  48  65  81  66 225  18 189  31 221  71  12  13  56  99 113
## [307]  24  17  19  23 153  53  67  79 293 113  12  64  43  26  73  47  42  28
## [325] 105 163 102  64  64  16 259  54  46  59  87   1  98 110  57 130  59   9
## [343] 203 221  57 180  66 114  32  93  32  70  23  58  45   8  19 107  60  30
## [361]  98   7  15  12  70 150 129 121  19 240 332  10  62   8   8  62  81  18
## [379] 135 128   2 116  34  29   4 118  21 154 137  54  89 109  80   6  34 184
## [397]  71 126 213 114 322  13  45  97  33 165  88  12  70  98 293  24  59  10
## [415] 170  53  79  70  56   5  53 174  45  66  33 105  26  78 159 123 104 105
## [433]   0  78  53  49 185  39 176  21  17  15 160 132 163  79  30  27 366  96
## [451]  75 104 118 122  14   5  97

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr17_7675237_7675993
##   [1]  390   69  315    4   84  356  962  504  248  329  471  402  109   63  472
##  [16]  300  293  660  167  492  103  417  267   21  143   92  152    4   86  356
##  [31]  963  140  616  286  148   92  341   85  138  117  506   21  385  408  189
##  [46]  289  287   25  195  172  142   51  106  561   20  550   40  238  549  339
##  [61]  134   28  173  261  191   24  227  177  612  148  261  165   78  417  316
##  [76]  354   79  894  108  492  909  257   38  614  119  835   77  481  136  143
##  [91]  719  436  176  104   90  446  388  406  286  392  267  658  488   76  388
## [106]  637  295  204   18   12  306  338  131  350  182  583 1042  278  152  435
## [121]   44  384  452  514   56  101   38  205  336  151  173  154  307  329  176
## [136]  308  113   17  540   21   21  395   38  102  667   52  109   82  659  442
## [151]  332  477  136  522  304  316  588  168   61   57  332   78  180   90 1235
## [166]  178  211  149  415  146   66  159  173   36  229    8  942  328  387  144
## [181]  564  236  876  139  110   75  221   62  354   10  372  154  114  256  602
## [196]   37  356  280  435  316  197   55   23   67  261   97  463  387  321  150
## [211]  298  269   60  446   54   50  101  532  259   80   62  135  416   75  435
## [226]   78  260  118  382  331  174  277  141  227  818  189   19  188  203  612
## [241]  243  178  445  463  307   75  233    4  251  128   71  120  422  422  154
## [256]  264  111  245  593  339  396  134  135  484  588   40  363  518   48  260
## [271]  788  157  403  352  225  189  519  374  105  369  447  293  165  227   91
## [286]  213   69   56  145   35  437  145  189  132  171  316   52  553   94  637
## [301]  114   31   59  173  416  387   70   48   56   61  394  196  218  247  462
## [316]  357   72  143  125  143  176  115   95   80  352 1359  337  211  555   49
## [331]  390  116  115  170  172    5  315  321  214  271  218   38  586  748  166
## [346] 1241  134  497  232  277   34  222   51  155  125   47   45  193  195  245
## [361]  551   31   72   38  184  378  367  411  126  424  581   24  159   47   28
## [376]  170  228   43  419  436    2  366  101   89   11  361   92  373  414  153
## [391]  255  359  576   12   65  416  200  426  572  310  527   59  108  331   76
## [406]  912  208   34  331  375  541   54  189   19  506   96  410  367  280   42
## [421]  261  269  138  164  112  296   84  465  549  371  577  392    6  253  250
## [436]  101  572  102  418   27   41   23  416  439  237  176   81   64  654  622
## [451]  161  289  408  317   87    8  315

9.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonDE <- (GeneSJ$chr17_7674972_7675052)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_CanonUE <- (GeneSJ$chr17_7675237_7675993)/GeneSJ$rowSum_SJtotal*100

GeneSJ$Normalized_AG <- (GeneSJ$chr17_7675216_7675993)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

9.3 VAF

Mutated samples vaf:

9.4 Plots

9.4.1 Static Dot Plots

Canonical splice junction chr17:7675237-7675993, aceptor splice site chr17:7675237

Splicing alterations:

9.4.2 Interactive Dot Plots

Canonical splice junction chr17:7675237-7675993, aceptor splice site chr17:7675237

Splicing alteration:

9.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

9.5 Statistical Analysis

SJCounts <- GeneSJ

9.5.1 Aceptor Loss

9.5.1.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonUE[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonUE[SJCounts$GROUP == "WT"]
## W = 0.85443, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonUE[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 12.83588

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonUE[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 5.270598

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonUE - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -7.565282

9.5.1.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:453] = -10.017, -9.1322, -8.6692,  ..., 11.746, 25.626
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.006578947
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonUE")] 
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- MUT_df$ECDF

MUT_df$Prediction <- "Aceptor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr17:7675237-7675993"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

9.5.2 Aceptor Gain

9.5.2.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_AG[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_AG[SJCounts$GROUP == "WT"]
## W = 0.72791, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_AG[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.05448304

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_AG[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 4.563813

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_AG - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 4.50933

9.5.2.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:247] = -0.054483, -0.034956, -0.034744,  ..., 0.37748, 0.53203
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 1
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_AG")] 
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")
MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1- MUT_df$ECDF

MUT_df$Prediction <- "Aceptor Gain"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr17:7675216-7675993"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

10 TP53 chr17,7675217,T,C

Variant found in 4 patients of the BeatAML (4 samples)

  • Patients with TP53 chr17,7675217,T,C variant: 4 patients (4 samples)
  • Patients with the variant and RNASeq for validation: 4 patients (4 samples)

The splicing alterations being assessed are:

  • Aceptor Loss: chr17:7675237-7675993; aceptor splice site chr17:7675237.
  • Aceptor Gain: chr17:7675216-7675993, found in the mutated samples.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"TP53_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="TP53" & found_variants$MutationKey_Hg38 == "chr17,7675217,T,C",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

10.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

10.1.1 Aceptor Gain

Search: chr17:7675216-7675993

Show all the splice junctions containing the position chr17:7675216-7675993

colnames(GeneSJ)[grep("7675216_7675993",colnames(GeneSJ))]
## [1] "chr17_7675216_7675993"

Found: chr17_7675216_7675993

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr17_7675216_7675993
##   [1]   1   0   0   0   0   2   3   1   0   0   3   1   2   0   1   2   0   4
##  [19]   0   0   0   1   0   0   1   0   2   0   1   3   0   2   3   0   0   0
##  [37]   1   1   1   0   2   0   2   4   3   1   0   0   1   1   0   0   3   3
##  [55]   0   1   0   0   4   0   0   0   0   0   1   0   1   0   0   1 226   0
##  [73]   1   2   0   1   1   7   0   2   4   0   0   3   0   3   0   3   1   1
##  [91]   5   3   1   0   0   1   3   2   2   1   1   1   2   0   2   4   0   0
## [109]   0   0   0   0   1   4   0   3   4   0   2   3   1   1   2   2   0   0
## [127]   0   0   1   1   0   0   0   2   0   1   2   0   2   0   0   1   0   0
## [145]   1   0   1   2   2   2   6   1   3   5   1   1   1   3   0   0   2   0
## [163]   2   2   5   0   0   0   0   0   0   1   2   0   0   0   6   1   2   3
## [181]   5   1   6   1   0   1   1   1   0   0   3   0   1   3   1   1   1   1
## [199]   2   3   3   0   0   1   1   0   0   1   1   0   2   1   0   2   0   2
## [217]   0   1   2   0   0   0   2   0   1   1   1   0   0   1   1   1   0   0
## [235]   0   0   0   0   0   4   3   1   0   0   1   0   5   0   1   0   0   4
## [253]   3   0   1   1   1   1   0   3   1   0   0   2   4   0   1   0   1   0
## [271]   3   1   2   1   1   0   1   4   1   3   1   1   0   0   1   1   0   1
## [289]   1   0   2   0   0   1   2   4   0   3   0   0   1   1   0   2   1   4
## [307]   0   0   0   0   3   0   0   1   0   0   1   0   1   0   1   1   0   0
## [325]   0   6   0   0   0   0   2   0   1   2   0   0   1   0   1   0   0   0
## [343]   1   6   0   4   0   1   1   2   0   1   0   0   0   0   0   0   6   1
## [361]   3   1   0   0   0   3   2   3   2   0   2   0   0   2   0   2   0   1
## [379]   1   4   0   2   0   2   0   5   0   2   4   1   0   2   6   0   0   2
## [397]   2   3   4   2   3   0   0   0   0   3   0   0   1   2   0   0   1   0
## [415]   3   0   0   1   1   0   0   1   1   1   0   0   0   0   0   3   5   1
## [433]   0   3   2   0   0   2   1   0   0   0   3   0   1   0   1   0   4   1
## [451]   2   1   1   1   0   0   0

Samples with the SJ of interest:

table(GeneSJ$chr17_7675216_7675993>0) 
## 
## FALSE  TRUE 
##   207   250

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr17_7675216_7675993 > 0])
## 
## MUT  WT 
##   1 249

Alternative SJ found in the mutated samples.

10.1.2 Canonical SJ

Canonical DE: chr17:7674972-7675052

Canonical UE: chr17:7675237-7675993; aceptor splice site chr17:7675237

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr17_7674972_7675052
##   [1] 141  24 108   9  10  90 187 146  72  53 165 160  39  11  90 106 169 255
##  [19]  43 175  11  89  88  15  75  39  41   2  31 212 177  49 216  88  17  35
##  [37] 122  27  47  37  88  32 219 139  67  93 115   7  57  46  42  13  39 113
##  [55]   3 109  12 149 183 139  34   4  78  37  69   6  35  65 183  59 217  55
##  [73]  20 148  94 125  25 295  31 160 151  91   8 185  33 116  16 161  61  50
##  [91] 136  74  41  49  29 136 158 140  89  96  63  94 158  26 100 317  96  60
## [109]   6  10 111  72  42  60  58 207 193 127  57 167  12 203 163 152  17  21
## [127]  16  66 111  58  60  49 116  62  59  99  35  11 183   8   8 163  12  36
## [145] 281  14  37  32 233 117  96 168  50 213  80  81 196  55  14   5  98  45
## [163]  63  30 170  76  73  52 153  49  39  71  32  11  76   7 155  93 123  79
## [181] 144  71 132  51  21  31  51  17 123  10  44  27  34  77 208  19 119  95
## [199] 120 138  80  22  11  39  91  31  70 131 111  77  47 135  14  99  33  27
## [217]  33 160 105  23  24  41 148  48 153  35  75  36 123  90  71  80  41  74
## [235] 123  68  17  68  62 161  93 100 144 115  82  19  93   0  82  19  18  32
## [253] 134 127  46 146  32 150 173 126 133  32  42 156  84  15 110 189  11  92
## [271] 252  55 154  82  60 100 104 144  28  55 165 110  88  62  25 121  33  10
## [289]  18  20 117  48  65  81  66 225  18 189  31 221  71  12  13  56  99 113
## [307]  24  17  19  23 153  53  67  79 293 113  12  64  43  26  73  47  42  28
## [325] 105 163 102  64  64  16 259  54  46  59  87   1  98 110  57 130  59   9
## [343] 203 221  57 180  66 114  32  93  32  70  23  58  45   8  19 107  60  30
## [361]  98   7  15  12  70 150 129 121  19 240 332  10  62   8   8  62  81  18
## [379] 135 128   2 116  34  29   4 118  21 154 137  54  89 109  80   6  34 184
## [397]  71 126 213 114 322  13  45  97  33 165  88  12  70  98 293  24  59  10
## [415] 170  53  79  70  56   5  53 174  45  66  33 105  26  78 159 123 104 105
## [433]   0  78  53  49 185  39 176  21  17  15 160 132 163  79  30  27 366  96
## [451]  75 104 118 122  14   5  97

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr17_7675237_7675993
##   [1]  390   69  315    4   84  356  962  504  248  329  471  402  109   63  472
##  [16]  300  293  660  167  492  103  417  267   21  143   92  152    4   86  356
##  [31]  963  140  616  286  148   92  341   85  138  117  506   21  385  408  189
##  [46]  289  287   25  195  172  142   51  106  561   20  550   40  238  549  339
##  [61]  134   28  173  261  191   24  227  177  612  148  261  165   78  417  316
##  [76]  354   79  894  108  492  909  257   38  614  119  835   77  481  136  143
##  [91]  719  436  176  104   90  446  388  406  286  392  267  658  488   76  388
## [106]  637  295  204   18   12  306  338  131  350  182  583 1042  278  152  435
## [121]   44  384  452  514   56  101   38  205  336  151  173  154  307  329  176
## [136]  308  113   17  540   21   21  395   38  102  667   52  109   82  659  442
## [151]  332  477  136  522  304  316  588  168   61   57  332   78  180   90 1235
## [166]  178  211  149  415  146   66  159  173   36  229    8  942  328  387  144
## [181]  564  236  876  139  110   75  221   62  354   10  372  154  114  256  602
## [196]   37  356  280  435  316  197   55   23   67  261   97  463  387  321  150
## [211]  298  269   60  446   54   50  101  532  259   80   62  135  416   75  435
## [226]   78  260  118  382  331  174  277  141  227  818  189   19  188  203  612
## [241]  243  178  445  463  307   75  233    4  251  128   71  120  422  422  154
## [256]  264  111  245  593  339  396  134  135  484  588   40  363  518   48  260
## [271]  788  157  403  352  225  189  519  374  105  369  447  293  165  227   91
## [286]  213   69   56  145   35  437  145  189  132  171  316   52  553   94  637
## [301]  114   31   59  173  416  387   70   48   56   61  394  196  218  247  462
## [316]  357   72  143  125  143  176  115   95   80  352 1359  337  211  555   49
## [331]  390  116  115  170  172    5  315  321  214  271  218   38  586  748  166
## [346] 1241  134  497  232  277   34  222   51  155  125   47   45  193  195  245
## [361]  551   31   72   38  184  378  367  411  126  424  581   24  159   47   28
## [376]  170  228   43  419  436    2  366  101   89   11  361   92  373  414  153
## [391]  255  359  576   12   65  416  200  426  572  310  527   59  108  331   76
## [406]  912  208   34  331  375  541   54  189   19  506   96  410  367  280   42
## [421]  261  269  138  164  112  296   84  465  549  371  577  392    6  253  250
## [436]  101  572  102  418   27   41   23  416  439  237  176   81   64  654  622
## [451]  161  289  408  317   87    8  315

10.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonDE <- (GeneSJ$chr17_7674972_7675052)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_CanonUE <- (GeneSJ$chr17_7675237_7675993)/GeneSJ$rowSum_SJtotal*100

GeneSJ$Normalized_AG <- (GeneSJ$chr17_7675216_7675993)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

10.3 VAF

Mutated samples vaf:

10.4 Plots

10.4.1 Static Dot Plots

Canonical splice junction chr17:7675237-7675993, aceptor splice site chr17:7675237

Splicing alterations:

10.4.2 Interactive Dot Plots

Canonical splice junction chr17:7675237-7675993, aceptor splice site chr17:7675237

Splicing alteration:

10.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

10.5 Statistical Analysis

SJCounts <- GeneSJ

10.5.1 Aceptor Loss

10.5.1.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonUE[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonUE[SJCounts$GROUP == "WT"]
## W = 0.85347, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonUE[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 12.82575

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonUE[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 18.781726  5.270598 11.630769 12.683578

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonUE - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1]  5.9559749 -7.5551532 -1.1949817 -0.1421729

10.5.1.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:450] = -10.007, -9.122, -8.6591,  ..., 11.757, 25.636
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.982339956 0.006622517 0.280353201 0.494481236
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonUE")] 
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- MUT_df$ECDF

MUT_df$Prediction <- "Aceptor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr17:7675237-7675993"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

10.5.2 Aceptor Gain

10.5.2.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_AG[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_AG[SJCounts$GROUP == "WT"]
## W = 0.72981, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_AG[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.05484385

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_AG[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.000000 4.563813 0.000000 0.000000

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_AG - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -0.05484385  4.50896875 -0.05484385 -0.05484385

10.5.2.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:247] = -0.054844, -0.035316, -0.035104,  ..., 0.37712, 0.53167
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.4503311 1.0000000 0.4503311 0.4503311
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_AG")] 
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1- MUT_df$ECDF

MUT_df$Prediction <- "Aceptor Gain"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr17:7675216-7675993"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

11 TP53 chr17,7674872,T,C

Variant found in 6 patients of the BeatAML (7 samples).

  • Patients with TP53 chr17,7674872,T,C variant: 6 patients (7 samples)
  • Patients with the variant and RNASeq for validation: 5 patient (5 samples)

The splicing alterations being assessed are:

  • Donor Gain: predicted at 1bp from the variant, chr17,7674872, not found in the splice junction collection.
  • Exon Skipping: chr17:7674291-7675052, found in the mutated samples.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"TP53_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="TP53" & found_variants$MutationKey_Hg38 == "chr17,7674872,T,C",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

11.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

11.1.1 Donor Gain

Search: predicted at 1bp from the variant position, chr17,7674872

Show all the splice junctions containing the positions between 7674870 - 7674879

colnames(GeneSJ)[grep("767487",colnames(GeneSJ))]
## character(0)

Alternative SJ not found in the splice junction collection.

11.1.2 Exon Skipping

Search: chr17:7674291-7675052

Show all the splice junctions containing the position chr17:7674291-7675052

colnames(GeneSJ)[grep("7674291_7675052",colnames(GeneSJ))]
## [1] "chr17_7674291_7675052"

Found: chr17_7674291_7675052

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr17_7674291_7675052
##   [1]  0  0  0  0  0  0  0  2  2  1  0  1  0  0  1  2  3  0  0  1  0  1  0  1  8
##  [26]  0  0  0  4  1  1  3  2  1  0  1  1  2  1  0  0  4  1  3  0  1  1  1  1  0
##  [51]  2  2  3  1  0  1  0  2 15  1  4  2  0  2  0  1  1  7  0  1  4  2  0  3  0
##  [76]  0  1  2  1  1  3  0  1  3  1  0  0  0  3  0  4  0  2  2  4  1  0  0  2  1
## [101]  6  0  1  0  2  2  2  2  0  0  1  5  0  1  2  1  1  0  0  1  1  1  0  1  2
## [126]  0  2  1  0  4  2  2  0  2  0  0  3  0  1  1  0  0  1  1  0  2  1  4  1  1
## [151]  0  5  2  2  0  0  6  0  2  2  0  3  1  1  5  0  1  0  0  0  8  2  0  1  1
## [176]  1  1  0  0  0  2  0  5  3  1  0  4  1  1  0  0  5  0  1  2  3  2  1  1  0
## [201]  0  3  2  1  1  0  1  0  1  1  5  4  1  1  2  0  5  0  3  1  4  1  2  0  1
## [226]  1  1  1  4  0  0  1  2  3  4  1  1  0  2  0  1  2  5  1  1  3  4  0  2  1
## [251]  1  3  3  0  2  0  2  0  1  0  2  8  3  0  2  0  0  0  1  1  2  1  1  4  0
## [276]  0  1  1  0  0  2  2  4  2  1  1  5  0  0  1  2 13  0  0  0  1  0  0  0  0
## [301]  0  0  2  0  2  0  1  0  0  3  0  1  5  1  0  1  2  0  2  2  1  0  7  1  1
## [326]  2  0  0  0  0  5  1  0  0  3  0  0  1  2  0  1  1  1  0  0  2  2  1  0  2
## [351]  0  0  1  0  1  0  7  1  1  5  1  2  0  4  3  0  0  7  9  0  0  4  1  3  3
## [376]  2  2  1  0  0  0  1  1  2  0  0  1  1  0  0  1  0  1  0  0  0  1  0  2  0
## [401]  0  0  2  1  2  1  0  1  2  1  1  0  0  0  0  1  1  0  0  1  1  0  1  1  2
## [426]  5 15  1  1  0  0  2  0  0  1  0  0  1  0  2  0  0  0  1  0  0  0  0  1  1
## [451]  0  1  0  2  0  0  0

Samples with the SJ of interest:

table(GeneSJ$chr17_7674291_7675052>0) 
## 
## FALSE  TRUE 
##   179   278

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr17_7674291_7675052 > 0])
## 
## MUT  WT 
##   4 274

Alternative SJ found in the mutated samples.

11.1.3 Canonical SJ

Exon upstream (UE): chr17:7674972-7675052

Exon downstream (DE): chr17:7674291-7674858

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr17_7674972_7675052
##   [1] 141  24 108   9  10  90 187 146  72  53 165 160  39  11  90 106 169 255
##  [19]  43 175  11  89  88  15  75  39  41   2  31 212 177  49 216  88  17  35
##  [37] 122  27  47  37  88  32 219 139  67  93 115   7  57  46  42  13  39 113
##  [55]   3 109  12 149 183 139  34   4  78  37  69   6  35  65 183  59 217  55
##  [73]  20 148  94 125  25 295  31 160 151  91   8 185  33 116  16 161  61  50
##  [91] 136  74  41  49  29 136 158 140  89  96  63  94 158  26 100 317  96  60
## [109]   6  10 111  72  42  60  58 207 193 127  57 167  12 203 163 152  17  21
## [127]  16  66 111  58  60  49 116  62  59  99  35  11 183   8   8 163  12  36
## [145] 281  14  37  32 233 117  96 168  50 213  80  81 196  55  14   5  98  45
## [163]  63  30 170  76  73  52 153  49  39  71  32  11  76   7 155  93 123  79
## [181] 144  71 132  51  21  31  51  17 123  10  44  27  34  77 208  19 119  95
## [199] 120 138  80  22  11  39  91  31  70 131 111  77  47 135  14  99  33  27
## [217]  33 160 105  23  24  41 148  48 153  35  75  36 123  90  71  80  41  74
## [235] 123  68  17  68  62 161  93 100 144 115  82  19  93   0  82  19  18  32
## [253] 134 127  46 146  32 150 173 126 133  32  42 156  84  15 110 189  11  92
## [271] 252  55 154  82  60 100 104 144  28  55 165 110  88  62  25 121  33  10
## [289]  18  20 117  48  65  81  66 225  18 189  31 221  71  12  13  56  99 113
## [307]  24  17  19  23 153  53  67  79 293 113  12  64  43  26  73  47  42  28
## [325] 105 163 102  64  64  16 259  54  46  59  87   1  98 110  57 130  59   9
## [343] 203 221  57 180  66 114  32  93  32  70  23  58  45   8  19 107  60  30
## [361]  98   7  15  12  70 150 129 121  19 240 332  10  62   8   8  62  81  18
## [379] 135 128   2 116  34  29   4 118  21 154 137  54  89 109  80   6  34 184
## [397]  71 126 213 114 322  13  45  97  33 165  88  12  70  98 293  24  59  10
## [415] 170  53  79  70  56   5  53 174  45  66  33 105  26  78 159 123 104 105
## [433]   0  78  53  49 185  39 176  21  17  15 160 132 163  79  30  27 366  96
## [451]  75 104 118 122  14   5  97

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr17_7674291_7674858
##   [1] 401  85 346   5  45 318 647 433 220 197 388 393  82  35 270 310 242 692
##  [19] 136 461  37 326 216  25  99  97 154   7  49 347 603 109 585 256  85  76
##  [37] 289  77 125 102 371  76 407 356 178 279 297  16 152 148  95  49 108 287
##  [55]  17 423  27 193 491 369 124   9 158 134 175  23 153 134 478 114 501 141
##  [73]  66 428 285 344  77 803  78 456 540 234  28 535  90 472  47 374 112 117
##  [91] 524 301 167 105  81 391 364 308 249 357 236 337 432  80 314 462 272 168
## [109]  15  16 306 257 118 223 128 517 731 243 152 360  49 358 371 463   2  78
## [127]  20 145 303 173 152 153 268 248 154 264  83  18 504  17  21 398  33  88
## [145] 556  43 105  79 628 403 297 388 132 464 226 303 618 175  40  28 276  80
## [163] 107  85 691 144 210 113 349 131  53 156 143  18 187  11 598 268 391 126
## [181] 444 177 473 132  64  72 218  65 334  27 141  62  85 195 593  33 299 253
## [199] 409 242 164  41  30  55 205  87 230 342 282 130 205 210  34 338  46  32
## [217]  85 469 262  70  45 101 379  81 427  57 183  80 316 307 151 225 110 211
## [235] 484 199  21 130 174 449 258 155 355 361 239  54 192   1 209  93  40  90
## [253] 359 389 148 254 102 230 507 309 344  99 108 499 338  21 330 454  30 244
## [271] 807 137 316 302 211 172 388 297 100 239 371 269 131 207  64 194  41  36
## [289]  66  45 359  96 137 138 148 349  31 496  72 512 100  21  69 162 376 368
## [307]  66  44  47  52 424 166 207 224 507 325  52 190 119 105 155 131  80  83
## [325] 301 656 231 185 320  57 419 118 116 130 132   1 236 281 195 233 198  25
## [343] 479 711 141 680 142 380 140 249  38 207  42 146 119  31  42 143 197 141
## [361] 341  26  65  34 140 306 334 395  73 380 633  29 155  28  19 162 201  36
## [379] 371 308   4 289  92  73   5 307  84 321 385 117 203 299 320  15  44 342
## [397] 170 360 510 296 475  25 109 259  89 702 198  47 263 263 492  34 133  24
## [415] 460  76 288 253 204  17 129 299 150 172  95 272  41 253 506 385 316 328
## [433]   2 232 184  80 495  97 335  33  28  24 374 427 254 155  88  47 544 386
## [451] 152 306 350 265  49   8 261

11.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonUE <- (GeneSJ$chr17_7674972_7675052)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_CanonDE <- (GeneSJ$chr17_7674291_7674858)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_ES <- (GeneSJ$chr17_7674291_7675052)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

11.3 VAF

Mutated samples vaf:

11.4 Plots

11.4.1 Static Dot Plots

Splicing alterations:

11.4.2 Interactive Dot Plots

Splicing alteration:

11.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

11.5 Statistical Analysis

SJCounts <- GeneSJ

11.5.1 Exon Skipping

11.5.1.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_ES[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_ES[SJCounts$GROUP == "WT"]
## W = 0.58035, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_ES[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.1275619

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_ES[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.12165450 0.00000000 0.29112082 0.09465215 0.02803476

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_ES - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -0.005907383 -0.127561884  0.163558931 -0.032909731 -0.099527121

11.5.1.2 ECDF

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:269] = -0.12756, -0.1131, -0.1117,  ..., 1.3909, 1.7266
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.7256637 0.3938053 0.8694690 0.6836283 0.4446903
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_ES")] 
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- NA

MUT_df$Prediction <- "Exon Skipping"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr17:7674291-7675052"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

11.5.1.3 Mann Whitney U

Normality:

shapiro.test(SJCounts$Normalized_ES[SJCounts$GROUP== "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_ES[SJCounts$GROUP == "WT"]
## W = 0.58035, p-value < 2.2e-16
shapiro.test(SJCounts$Normalized_ES[SJCounts$GROUP== "MUT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_ES[SJCounts$GROUP == "MUT"]
## W = 0.89878, p-value = 0.4032

Mann-Whitney:

wt <- wilcox.test(x=SJCounts$Normalized_ES[SJCounts$GROUP== "MUT"], 
                  y=SJCounts$Normalized_ES[SJCounts$GROUP== "WT"],
                  alternative = "two.sided", 
                  paired = FALSE, 
                  conf.int = 0.95)

wt
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  SJCounts$Normalized_ES[SJCounts$GROUP == "MUT"] and SJCounts$Normalized_ES[SJCounts$GROUP == "WT"]
## W = 1320, p-value = 0.5057
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  -0.06114993  0.09467616
## sample estimates:
## difference in location 
##             0.02797629

12 TP53 chr17,7674894,G,A

Variant found in 2 patients of the BeatAML (2 samples)

  • Patients with TP53 chr17,7674894,G,A variant: 2 patients (2 samples)
  • Patients with the variant and RNASeq for validation: 2 patients (2 samples)

The splicing alterations being assessed are:

  • Donor Loss: chr17:7674291-7674858; donor splice site chr17:7674858.
  • Exon Skipping: chr17:7674291-7675052, found in the mutated samples.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"TP53_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="TP53" & found_variants$MutationKey_Hg38 == "chr17,7674894,G,A",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

12.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

12.1.1 Exon Skipping

Search: chr17:7674291-7675052

Show all the splice junctions containing the position chr17:7674291-7675052

colnames(GeneSJ)[grep("7674291_7675052",colnames(GeneSJ))]
## [1] "chr17_7674291_7675052"

Found: chr17_7674291_7675052

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr17_7674291_7675052
##   [1]  0  0  0  0  0  0  0  2  2  1  0  1  0  0  1  2  3  0  0  1  0  1  0  1  8
##  [26]  0  0  0  4  1  1  3  2  1  0  1  1  2  1  0  0  4  1  3  0  1  1  1  1  0
##  [51]  2  2  3  1  0  1  0  2 15  1  4  2  0  2  0  1  1  7  0  1  4  2  0  3  0
##  [76]  0  1  2  1  1  3  0  1  3  1  0  0  0  3  0  4  0  2  2  4  1  0  0  2  1
## [101]  6  0  1  0  2  2  2  2  0  0  1  5  0  1  2  1  1  0  0  1  1  1  0  1  2
## [126]  0  2  1  0  4  2  2  0  2  0  0  3  0  1  1  0  0  1  1  0  2  1  4  1  1
## [151]  0  5  2  2  0  0  6  0  2  2  0  3  1  1  5  0  1  0  0  0  8  2  0  1  1
## [176]  1  1  0  0  0  2  0  5  3  1  0  4  1  1  0  0  5  0  1  2  3  2  1  1  0
## [201]  0  3  2  1  1  0  1  0  1  1  5  4  1  1  2  0  5  0  3  1  4  1  2  0  1
## [226]  1  1  1  4  0  0  1  2  3  4  1  1  0  2  0  1  2  5  1  1  3  4  0  2  1
## [251]  1  3  3  0  2  0  2  0  1  0  2  8  3  0  2  0  0  0  1  1  2  1  1  4  0
## [276]  0  1  1  0  0  2  2  4  2  1  1  5  0  0  1  2 13  0  0  0  1  0  0  0  0
## [301]  0  0  2  0  2  0  1  0  0  3  0  1  5  1  0  1  2  0  2  2  1  0  7  1  1
## [326]  2  0  0  0  0  5  1  0  0  3  0  0  1  2  0  1  1  1  0  0  2  2  1  0  2
## [351]  0  0  1  0  1  0  7  1  1  5  1  2  0  4  3  0  0  7  9  0  0  4  1  3  3
## [376]  2  2  1  0  0  0  1  1  2  0  0  1  1  0  0  1  0  1  0  0  0  1  0  2  0
## [401]  0  0  2  1  2  1  0  1  2  1  1  0  0  0  0  1  1  0  0  1  1  0  1  1  2
## [426]  5 15  1  1  0  0  2  0  0  1  0  0  1  0  2  0  0  0  1  0  0  0  0  1  1
## [451]  0  1  0  2  0  0  0

Samples with the SJ of interest:

table(GeneSJ$chr17_7674291_7675052>0) 
## 
## FALSE  TRUE 
##   179   278

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr17_7674291_7675052 > 0])
## 
## MUT  WT 
##   1 277

Alternative SJ found in the mutated samples.

12.1.2 Canonical SJ

Exon upstream (UE): chr17:7674972-7675052

Exon downstream (DE): chr17:7674291-7674858, splice site chr17:7674858

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr17_7674972_7675052
##   [1] 141  24 108   9  10  90 187 146  72  53 165 160  39  11  90 106 169 255
##  [19]  43 175  11  89  88  15  75  39  41   2  31 212 177  49 216  88  17  35
##  [37] 122  27  47  37  88  32 219 139  67  93 115   7  57  46  42  13  39 113
##  [55]   3 109  12 149 183 139  34   4  78  37  69   6  35  65 183  59 217  55
##  [73]  20 148  94 125  25 295  31 160 151  91   8 185  33 116  16 161  61  50
##  [91] 136  74  41  49  29 136 158 140  89  96  63  94 158  26 100 317  96  60
## [109]   6  10 111  72  42  60  58 207 193 127  57 167  12 203 163 152  17  21
## [127]  16  66 111  58  60  49 116  62  59  99  35  11 183   8   8 163  12  36
## [145] 281  14  37  32 233 117  96 168  50 213  80  81 196  55  14   5  98  45
## [163]  63  30 170  76  73  52 153  49  39  71  32  11  76   7 155  93 123  79
## [181] 144  71 132  51  21  31  51  17 123  10  44  27  34  77 208  19 119  95
## [199] 120 138  80  22  11  39  91  31  70 131 111  77  47 135  14  99  33  27
## [217]  33 160 105  23  24  41 148  48 153  35  75  36 123  90  71  80  41  74
## [235] 123  68  17  68  62 161  93 100 144 115  82  19  93   0  82  19  18  32
## [253] 134 127  46 146  32 150 173 126 133  32  42 156  84  15 110 189  11  92
## [271] 252  55 154  82  60 100 104 144  28  55 165 110  88  62  25 121  33  10
## [289]  18  20 117  48  65  81  66 225  18 189  31 221  71  12  13  56  99 113
## [307]  24  17  19  23 153  53  67  79 293 113  12  64  43  26  73  47  42  28
## [325] 105 163 102  64  64  16 259  54  46  59  87   1  98 110  57 130  59   9
## [343] 203 221  57 180  66 114  32  93  32  70  23  58  45   8  19 107  60  30
## [361]  98   7  15  12  70 150 129 121  19 240 332  10  62   8   8  62  81  18
## [379] 135 128   2 116  34  29   4 118  21 154 137  54  89 109  80   6  34 184
## [397]  71 126 213 114 322  13  45  97  33 165  88  12  70  98 293  24  59  10
## [415] 170  53  79  70  56   5  53 174  45  66  33 105  26  78 159 123 104 105
## [433]   0  78  53  49 185  39 176  21  17  15 160 132 163  79  30  27 366  96
## [451]  75 104 118 122  14   5  97

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr17_7674291_7674858
##   [1] 401  85 346   5  45 318 647 433 220 197 388 393  82  35 270 310 242 692
##  [19] 136 461  37 326 216  25  99  97 154   7  49 347 603 109 585 256  85  76
##  [37] 289  77 125 102 371  76 407 356 178 279 297  16 152 148  95  49 108 287
##  [55]  17 423  27 193 491 369 124   9 158 134 175  23 153 134 478 114 501 141
##  [73]  66 428 285 344  77 803  78 456 540 234  28 535  90 472  47 374 112 117
##  [91] 524 301 167 105  81 391 364 308 249 357 236 337 432  80 314 462 272 168
## [109]  15  16 306 257 118 223 128 517 731 243 152 360  49 358 371 463   2  78
## [127]  20 145 303 173 152 153 268 248 154 264  83  18 504  17  21 398  33  88
## [145] 556  43 105  79 628 403 297 388 132 464 226 303 618 175  40  28 276  80
## [163] 107  85 691 144 210 113 349 131  53 156 143  18 187  11 598 268 391 126
## [181] 444 177 473 132  64  72 218  65 334  27 141  62  85 195 593  33 299 253
## [199] 409 242 164  41  30  55 205  87 230 342 282 130 205 210  34 338  46  32
## [217]  85 469 262  70  45 101 379  81 427  57 183  80 316 307 151 225 110 211
## [235] 484 199  21 130 174 449 258 155 355 361 239  54 192   1 209  93  40  90
## [253] 359 389 148 254 102 230 507 309 344  99 108 499 338  21 330 454  30 244
## [271] 807 137 316 302 211 172 388 297 100 239 371 269 131 207  64 194  41  36
## [289]  66  45 359  96 137 138 148 349  31 496  72 512 100  21  69 162 376 368
## [307]  66  44  47  52 424 166 207 224 507 325  52 190 119 105 155 131  80  83
## [325] 301 656 231 185 320  57 419 118 116 130 132   1 236 281 195 233 198  25
## [343] 479 711 141 680 142 380 140 249  38 207  42 146 119  31  42 143 197 141
## [361] 341  26  65  34 140 306 334 395  73 380 633  29 155  28  19 162 201  36
## [379] 371 308   4 289  92  73   5 307  84 321 385 117 203 299 320  15  44 342
## [397] 170 360 510 296 475  25 109 259  89 702 198  47 263 263 492  34 133  24
## [415] 460  76 288 253 204  17 129 299 150 172  95 272  41 253 506 385 316 328
## [433]   2 232 184  80 495  97 335  33  28  24 374 427 254 155  88  47 544 386
## [451] 152 306 350 265  49   8 261

12.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonDE <- (GeneSJ$chr17_7674291_7674858)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_ES <- (GeneSJ$chr17_7674291_7675052)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

12.3 VAF

Mutated samples vaf:

12.4 Plots

12.4.1 Static Dot Plots

Canonical splice junction Exon downstream (DE): chr17:7674291-7674858, donor splice site chr17:7674858

ggplot(GeneSJ, aes(sample_id,Normalized_CanonDE,color=GROUP)) +
  geom_point(size=.8) +
  labs(color='GROUP', 
       title = "Normalized Expression of TP53 chr17:7674291-7674858", 
       subtitle="Potential Donor Loss Effect on Canonical SJ",
       y = "Normalized Expression")

Splicing alterations:

12.4.2 Interactive Dot Plots

Canonical splice junction Exon downstream (DE): chr17:7674291-7674858, donor splice site chr17:7674858

Splicing alteration:

12.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

Donor Loss:

Exon Skipping:

12.5 Statistical Analysis

SJCounts <- GeneSJ

12.5.1 Donor Loss

12.5.1.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonDE[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonDE[SJCounts$GROUP == "WT"]
## W = 0.90368, p-value = 2.288e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonDE[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 10.52434

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonDE[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 10.95890 11.65865

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonDE - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 0.4345613 1.1343110

12.5.1.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:454] = -10.14, -8.0853, -7.0494,  ..., 6.1423, 7.2388
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.5670330 0.7758242
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonDE")] 
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- MUT_df$ECDF

MUT_df$Prediction <- "Donor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr17:7674291-7674858"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

12.5.2 Exon Skipping

12.5.2.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_ES[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_ES[SJCounts$GROUP == "WT"]
## W = 0.5814, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_ES[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.1276335

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_ES[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.0000000 0.1201923

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_ES - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -0.12763350 -0.00744119

12.5.2.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:272] = -0.12763, -0.11317, -0.11177,  ..., 1.3908, 1.7265
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.3912088 0.7230769
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_ES")] 
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1 - MUT_df$ECDF

MUT_df$Prediction <- "Exon Skipping"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr17:7674291-7675052"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

13 ASXL1 chr20,32433787,C,T

Variant found in 1 patient of the BeatAML (1 sample)

  • Patients with ASXL1 chr20,32433787,C,T: 1 patient (1 sample)
  • Patients with the variant and RNASeq for validation: 1 patient (1 sample)

The splicing alterations being assessed are:

  • Donor Gain: predicted at 2bp from the variant,chr20:32433786, not found in the splice junction collection.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"ASXL1_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="ASXL1" & found_variants$MutationKey_Hg38 == "chr20,32433787,C,T",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

13.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

13.1.1 Donor Gain

Search: predicted at 2bp from the variant, chr20:32433786

Show all the splice junctions containing the positions between 32433780-32433789

colnames(GeneSJ)[grep("3243378",colnames(GeneSJ))]
## character(0)

Alternative SJ not found in the splice junction collection.

13.2 VAF

Mutated samples vaf:

14 EP300 chr22,41160652,T,C

Variant found in 2 patients of the BeatAML (2 samples)

  • Patients with EP300 chr22,41160652,T,C: 2 patients (2 samples)
  • Patients with the variant and RNASeq for validation: 2 patients (2 samples)

The splicing alterations being assessed are:

  • Donor Gain: predicted at the variant position, chr22:41160653, not found in the splice junction collection.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"EP300_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="EP300" & found_variants$MutationKey_Hg38 == "chr22,41160652,T,C",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

14.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

14.1.1 Donor Gain

Search: predicted at the variant position, chr22:41160653

Show all the splice junctions containing the positions between 41160650-41160659

colnames(GeneSJ)[grep("4116065",colnames(GeneSJ))]
## character(0)

Alternative SJ not found in the splice junction collection.

14.2 VAF

Mutated samples vaf:

15 DNMT3A chr2,25239130,C,T

Variant found in 1 patient of the BeatAML (1 sample)

  • Patients with DNMT3A chr2,25239130,C,T variant: 1 patient (1 sample)
  • Patients with the variant and RNASeq for validation: 1 patient (1 sample)

The splicing alterations being assessed are:

  • Donor Loss: chr2:25237006-25239129; donor splice site chr2:25239129.
  • Donor Gain: chr2:25237006-25239139, found in the mutated samples.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"DNMT3A_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="DNMT3A" & found_variants$MutationKey_Hg38 == "chr2,25239130,C,T",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

15.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

15.1.1 Donor Gain

Search: chr2:25237006-25239139

Show all the splice junctions containing the position chr2:25237006-25239139

colnames(GeneSJ)[grep("25237006_25239139",colnames(GeneSJ))]
## [1] "chr2_25237006_25239139"

Found: chr2_25237006_25239139

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr2_25237006_25239139
##   [1]  0  4  5  0  3  3  4  2  0  5  0  0  4  0  0  7 14  6  0  7  0  5  2  0  0
##  [26]  3  3  0  6  4  0  2  0  3  0  0  6  2  2  0  0  7 12  0  0  6  0  0  0  0
##  [51]  3  0  1  2  0  7  0  0  0  0  4  0  1  4  0  0  4  0  0  0 12  0  0  0  4
##  [76]  0  0 23  0  7  0  1  2  9  2  0  3  3  0  0  0  4  0 11  6  0  2  0  0  5
## [101] 10  0  2  0  5 15  6  5  0  0  0  0  4  2  4  0  9  3  0  0  0  0  8  7 10
## [126]  0  3  0  0  0  0  0  0  4  0  0  0  0  0  0  0  0  0  0 13  2  0  4  6  3
## [151]  0  0  5  2 17  4  7  0  0  0  0 12  0  0  4  0  0  5  4  2  2  0  0  0  1
## [176]  0  3  3  2  0  2  0  7  0  2  2  0  0  3  0  0  0  3  0  4  3  4  5  5  5
## [201]  1  0  0  0  0  6  0  6  5  3  4  3  0  0  3  4 10  4  0  0  0  1  4  0  0
## [226]  5  3  0  0  0  7  8  0  0  7  0  0  7  2  4  0  0  6  7  5  8  0  0  0  0
## [251]  0  0  0  0  0  7  0  0  6  0  4  0  0  6  0  9  6  2  3  0  0  3  5  2  0
## [276]  5  0  0  3  4  0  0  5  4  3  2  0  2  0  0  0  0  7  0  0  0  3 14  0  5
## [301]  0  0  2  3  3  2  4  0  3  0  1  3  0  0 11  7  0  2  0  0  0  0  0  1  5
## [326]  0  0  2  0  2  5  0  4  2  4  7  8  4  0  0  0  0  1 11  0  0  3  0  7  2
## [351]  0  0  2  3  7  0  6  1  4  2  3  0  1  0  0  0  0  7  5 14  0  7  0  0  2
## [376]  0  2  0  5  2  0  0  0  4  0  8  0  3  0  0  7  3  0  0  0  9  0  5  0  0
## [401]  9  4  0  0  3  0  3  0  8  4  2  0  0  0  6  9  7  0  0  0  3 12  4  5  4
## [426]  6  8  6  0  4  4  3  0  5  0  2  0  0  7  6  2  0  5  0  6  0  1  0  6  0
## [451]  4  8  0  4  5  0  2

Samples with the SJ of interest:

table(GeneSJ$chr2_25237006_25239139>0) 
## 
## FALSE  TRUE 
##   226   231

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr2_25237006_25239139 > 0])
## 
## MUT  WT 
##   1 230

Alternative SJ found in the mutated samples.

15.1.2 Canonical SJ

Exon upstream (UE): chr2:25239216-25240301

Exon downstream (DE): chr2:25237006-25239129; donor splice site chr2:25239129

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr2_25239216_25240301
##   [1] 106  34  89   9  25  79  79 115  36  48  77  56  62  15  62 194 133 135
##  [19]  24  99  23  72  62  30 167  37  34   7  47 136  49  76  55  50  32  81
##  [37]  72  68  49  15  25  91 216 108  26 114  22  21  74   9  65  34  24  78
##  [55]  29  96  18  36 129 123 108  38  32  56  23  44 122  57 179 137 267 123
##  [73] 104  76  86 111  34 420  43 151 101  16  87 147  77 145  35  82  38  72
##  [91] 115 147  24 118 116  35  58  82  49  96 101  37  99  81  62 392 133  44
## [109]  22  14  68  47  78  84  60 143 128 138  35 120 116  26 206 170 140  21
## [127]  50  28 109  90  37  87  35 107  14  29  91  16 154  18  15 160  27  40
## [145] 179  50  35  80  97 105  28  66 107  32 186  90 171  53  33  36  47 117
## [163]  25  92 116  39  81  79  66  30  31 107  49  45  74  50 192  94  56  65
## [181] 141  87 225  17  15  32  31  72 114   8  95  16  20 107 147  55  83 104
## [199] 140 137  57  24  14  65  32  62  76  81  27  89  94  81  37  29  56  35
## [217] 137  81  63  54  28  37  65  46 270 137  33  15 121  68  76 101  33  58
## [235] 131  46  49  94  27 140  67 139 163  84 101 258 100  11  84  38  48  87
## [253]  77  93  98 118  56  84 158  93 109  96  86 128  42 111 112  98  35  81
## [271]  89  72 137  59  40  71  50  95  41  81 117  99  73  54  28  58  85  11
## [289]  14  57  82 149 156  28  44  89  86 373  13 185 131  16  32  52 207  71
## [307]  47  30  32  46  46  44  23 105 273  83  11  28  50  79  20  51  75  71
## [325]  84 144  22  39  55  13 116  65  29  65 104 109 130 126  52  46  48  30
## [343]  90 198  89 100  94  81 108 126  81  87  37  59  80  18 172  69 113  40
## [361] 125  47  18   8  19 119 112 159  70 339 207 106  70  57  52  44  58  46
## [379]  43 123   0  85  22  42  28 129  61  51 131  78 107 115  58  51  67 131
## [397]  60  61  68  33 143  44  37  88  67 310 132  66 109 124  67  41  29  34
## [415]  73  88 142  69  47  32  46 124 130  48  48 107 184  86 125 127  89  91
## [433]   0 118  38  44 122  39 231  56  51  26 132  51 217  44  17  35 123  94
## [451]  54 117  90  64  32  11  67

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr2_25237006_25239129
##   [1] 138  65 159  12  49 137 167 172  50  87 105  70 119  28 124 304 178 254
##  [19]  38 127  32 138  93  44 232  71  75  20  92 198  83 129  75  85  47 129
##  [37] 118  97  61  25  40 160 208 178  44 199  20  47 111  18 102  46  46 139
##  [55]  58 131  26  42 219 180 165  70  45 105  39  58 208  85 281 194 441 196
##  [73] 168 145 157 128  65 761  75 255 195  36 191 251 136 272  53 106  50  91
##  [91] 186 253  36 188 180  61  91 128  72 167 156  63 179 131  99 460 191  63
## [109]  51  31  95  81 110 144  98 245 232 184  75 188 195  37 352 287 225  51
## [127]  89  55 181 168  68 140  65 163  25  56 140  27 238  35  24 215  62  58
## [145] 292  97  65 131 157 165  44  95 138  46 321 159 259  74  61  64  64 142
## [163]  60 152 238  54 120 122 104  55  50 171  85  66 136  68 343 146  93 113
## [181] 247 158 379  36  34  52  72 132 197  19 136  29  45 134 223  80 128 174
## [199] 277 205  95  39  20  86  41  94 109 158  47 110 156 105  50  68  75  56
## [217] 192 140  96  93  39  55  99  64 424 184  56  26 195 110  99 214  46  96
## [235] 183  89  86 184  36 285 144 174 246 161 157 409 167  20 138  54  70 126
## [253] 121 130 140 160  78  76 241 180 162 151 126 207  92 212 188 160  76 124
## [271] 150 147 215 105  56  97  98 167  59 129 174 166 107  96  30  78 160  13
## [289]  26 103 130 153 272  62  65 107  97 546  21 198 144  32  35  65 297 122
## [307]  69  44  53  97  94  71  67 143 342 108  14  53 102 116  38  89 159  98
## [325] 122 238  30  84  99  29 173 118  75  96 134 152 217 180  87  87  69  55
## [343] 125 292 142 174 123 126 175 185 108 166  57 105 136  34 269  95 197  79
## [361] 201 101  41  22  31 199 157 212 129 437 284 199 102 107  70  68 136  67
## [379]  80 228   1 154  38  81  49 233  72  87 213 135 164 191 104  88 115 223
## [397]  65  90 119  60 161  76  81 136 138 532 211 119 151 226  77  73  47  70
## [415] 123 112 214 112  95  47  75 143 259  60  71 211 278 171 205 219 158 124
## [433]   0 181  77  65 214  52 370  75  69  42 178  87 255  75  21  37 159 162
## [451]  91 198 176 111  53  11 116

15.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonDE <- (GeneSJ$chr2_25237006_25239129)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_DG <- (GeneSJ$chr2_25237006_25239139)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

15.3 VAF

Mutated samples vaf:

15.4 Plots

15.4.1 Static Dot Plots

Canonical splice junction:

Splicing alterations:

15.4.2 Interactive Dot Plots

Canonical splice junction:

Splicing alteration:

15.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

Donor Loss:

Exon Skipping:

15.5 Statistical Analysis

SJCounts <- GeneSJ

15.5.1 Donor Loss

15.5.1.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonDE[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonDE[SJCounts$GROUP == "WT"]
## W = 0.92632, p-value = 3.542e-14

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonDE[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 6.303546

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonDE[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 6.47526

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonDE - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 0.1717137

15.5.1.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:453] = -6.3035, -3.3822, -2.3648,  ..., 4.8076, 7.5853
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.6052632
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonDE")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- MUT_df$ECDF

MUT_df$Prediction <- "Donor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr2:25237006-25239129"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

15.5.2 Donor Gain

15.5.2.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_DG[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_DG[SJCounts$GROUP == "WT"]
## W = 0.81071, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_DG[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.1089652

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_DG[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.1832621

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_DG - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 0.07429686

15.5.2.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:225] = -0.10897, -0.056963, -0.055346,  ..., 0.37789, 0.39103
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.7171053
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_DG")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1 - MUT_df$ECDF

MUT_df$Prediction <- "Donor Gain"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr2:25237006-25239139"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

16 DNMT3A chr2,25235792,T,C

Variant found in 1 patient of the BeatAML (1 sample)

  • Patients with DNMT3A chr2,25235792,T,C variant: 1 patient (1 sample)
  • Patients with the variant and RNASeq for validation: 1 patient (1 sample)

The splicing alterations being assessed are:

  • Donor Gain: predicted at 5bp from the variant, chr2:25235796, not found in the splice junction collection.
  • Aceptor Gain: chr2:25235821-25236935, found in the mutated samples.
  • Aceptor Loss: chr2:25235826-25236935, aceptor splice site chr2:25238256.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"DNMT3A_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="DNMT3A" & found_variants$MutationKey_Hg38 == "chr2,25235792,T,C",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

16.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

16.1.1 Donor Gain

Search: predicted at 5bp from the variant, chr2:25235796

Show all the splice junctions containing the positions between 25235790-25235799

colnames(GeneSJ)[grep("2523579",colnames(GeneSJ))]
## character(0)

Alternative SJ not found in the splice junction collection.

16.1.2 Aceptor Gain

Search: chr2:25235821-25236935

Show all the splice junctions containing the position chr2:25235821-25236935

colnames(GeneSJ)[grep("25235821_25236935",colnames(GeneSJ))]
## [1] "chr2_25235821_25236935"

Found: chr2_25235821_25236935

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr2_25235821_25236935
##   [1]  6  0  8  0  0  2 10  2  4  5  4  3  9  0  6 16 15 13  3  1  1  4  0  4 10
##  [26]  8  0  0 10 10  5  8  5  5  4  8  5  5  6  3  0 10 11 10  2  9  0  2  6  1
##  [51]  7  5  0  0  3 10  0  3 12  5  6  4  2  7  0  0 14  6 12  4 11  0  5  0  5
##  [76]  6  1 32  6 23 14  0  6 10  3 19  5  0  5  4  5 18  0  8 14  0  0  6  2  5
## [101] 16  3  8  5  2 29  8  3  2  0  5  5  9  9  0  5  7  4  3  4  7  0 12 13  5
## [126]  4  3  0 11  8  1  8  6 10  0  0  8  0 12  1  0  7  4  3 12  9  4  5  5  3
## [151]  0  4  7  2 22  7 17  7  5  0  1  8  4  8 27  0  3  8  3  3  3  8  4  9 10
## [176]  7  8  7  0  4 10  2 18  0  0  0  0  5  7  0  4  4  0  8  8  4  4 11  6 14
## [201]  8  3  0  4  3  0  4  6  2  5  9  6  0  0  4  0  7  8  2  4  0  0  2  1 22
## [226] 15  0  0  5  6  5  9  4  9 10  8  6 12  3  9 11  9  7 11  9 18  0  1  0  3
## [251]  7  8 10  6  6  5  4  4 11  7  4  4  4 15  6  7 12  7  9  6  3  7  6  0  2
## [276]  5  0  8  4  7 11 11  7  5  3  8 13  0  0  0  7  7 17  3  0  5  0 25  2  7
## [301]  8  1  0  0  9  6  0  7  2  2  0  0  2  9 13  6  1  2  8 10  1  1  7  2  6
## [326] 14  0  0  5  0 10  1  4  2  8  4  0  7  0  1  5  5  5 18 10 11  8  7 11  9
## [351] 10 13  2  0  8  0 17  8 14  0  4  0  3  1  3 12  3  7  8 25 11 14  6  0  2
## [376]  4  6  5  4  9  0  5  0  7  4 13  9  6 14  4 12  9  4  4  7  5  1  6  2  0
## [401]  0  8  2  7  5 15  7  8 10 11  3  5  5  0  4  5 13  3  0  1  9  7 16  6  2
## [426]  5 17 12 14 14  5  5  0  7  8  7 12  0 13  2  2  4  7  4 21  2  0  0  6  6
## [451]  6  4  4  3  3  1  5

Samples with the SJ of interest:

table(GeneSJ$chr2_25235821_25236935>0) 
## 
## FALSE  TRUE 
##    82   375

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr2_25235821_25236935 > 0])
## 
## MUT  WT 
##   1 374

Alternative SJ found in the mutated samples.

16.1.3 Canonical SJ

Exon upstream (UE): chr2:25235826-25236935, aceptor splice site chr2:25238256

Exon downstream (DE): chr2:25234421-25235706, donor splice site chr2:25235706

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr2_25235826_25236935
##   [1] 178  79 195  15  71 166 206 217  57 110 121  84 159  38 189 361 239 277
##  [19]  56 145  39 189 137  50 270 106  83  29 102 249  90 146  99 116  60 162
##  [37] 159 123  74  32  56 201 254 201  56 273  25  65 144  29 121  48  63 221
##  [55]  84 166  30  49 296 216 207  88  51 134  47  76 270 112 328 253 510 241
##  [73] 246 170 195 131  93 874  90 314 224  43 212 291 172 354  80 111  58 139
##  [91] 265 321  51 220 227  66  86 165  93 214 214  81 192 148 133 567 213  84
## [109]  54  28 131  88 152 196 113 288 309 218  89 192 191  39 424 338 249  52
## [127]  95  59 204 215  89 175  87 209  34  75 159  38 289  33  31 267  77  67
## [145] 353 101  65 150 166 248  68 111 175  62 445 203 323  98  69  91  83 173
## [163]  69 165 316  71 124 148 115  70  43 224 136  75 163  88 473 154 122 116
## [181] 284 173 469  52  34  54  97 147 211  29 173  37  63 149 270 103 171 207
## [199] 305 285 130  34  28 109  45 110 141 190  60 115 199 132  63  85  83  70
## [217] 205 172 118 108  57  72 143  96 546 224  69  25 208 137 125 261  59 111
## [235] 235 100  87 223  59 336 168 230 318 207 184 457 198  23 153  69  87 160
## [253] 139 158 160 174 104 112 338 212 199 190 167 233 130 287 231 166  91 141
## [271] 183 157 265 130  77 124 122 201  77 198 219 185 126 116  46  87 186  25
## [289]  32 105 195 143 308  60  75 163  61 684  26 183 165  43  43  66 400 134
## [307]  92  57  74 113  99  91  88 174 387 169  24  58 117 150  30  86 185 110
## [325] 144 283  14  90 148  40 204 125  86 106 161 161 253 187 117  96  97  55
## [343]  86 366 169 183 147 170 245 233 126 193  59 106 166  34 319 112 211  96
## [361] 245 111  40  21  42 249 199 251 170 545 358 238 119  95  74  86 140  79
## [379]  93 237   1 189  45  98  72 263  95 100 258 148 222 243 138  99 129 291
## [397]  71 107 118  72 228  84  99 154 169 627 264 145 228 311  87  94  53  78
## [415] 152 142 272 154 111  72  95 197 279  93 102 246 315 213 242 229 219 135
## [433]   1 233  89  76 245  55 449 103  86  37 187  86 320  96  24  30 218 209
## [451] 106 230 227 127  82  14 141

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr2_25234421_25235706
##   [1] 145  51 152  16  72 152 183 209  47 108  81  74 109  37 176 297 173 221
##  [19]  31  92  39 140  94  37 173  56  73  31  79 166  90 110  67  74  66 100
##  [37] 106  81  51  10  61 168 171 121  35 200  26  31 141  22  84  36  31 191
##  [55]  56 158  18  45 242 146 163  73  36 144  41  62 199 119 212 161 348 161
##  [73] 115 148 144 162  55 734  82 248 235  38 136 244 140 305  85  96  46 131
##  [91] 203 331  45 203 216  34  62 115  52 203 170  74 138 113 110 443 156  36
## [109]  58  15 102  78 127 188  55 246 280 161  71  93 138  25 298 285 230  39
## [127]  67  57 174 117  57 164  46 185  23  63 133  31 227   7  20 180  60  44
## [145] 239  93  55 153 163 154  44  68 133  60 388 135 253  70  50  75  53 155
## [163]  51 147 315  44  93  91  79  47  31 164 106  50 122  71 432  98  75  76
## [181] 220 130 488  35  22  49  86 121 146  10 177  39  42 135 207  82 121 153
## [199] 252 187  73  36  22  85  35  64 122 165  61  72 182  98  40  88  53  57
## [217] 172 137  94  78  54  36 111  60 430 164  67  17 128 114 116 188  44  85
## [235] 212 112  69 175  51 243 134 182 222 178 179 387 135  13 112  66  58 116
## [253] 102 136 127 118  84  74 258 130 129 125 103 181 120 182 203 118  71  90
## [271] 116 132 141 157  57  73  79 140  71 193 190 112  88  78  33  66 145  24
## [289]  33  54 137 172 228  42  40 100  71 459  28 207 133  16  42  76 290 103
## [307]  51  44  45  71  70  66  58 126 276  89  13  47  80 137  29  56 141  77
## [325] 164 242  23 105 107  35 160  80  58  79 112 119 208 162  76  56  75  50
## [343] 157 274 133 195 109 147 211 149  68 166  61  91 107  34 212  95 187  92
## [361] 235  80  52  18  24 144 132 228 124 409 237 210  97  52  62  55 105  56
## [379]  89 159   0 142  35  72  50 235 100  66 167 110 140 198 160  80  83 167
## [397]  54  71  63  46 160 112  63 106 106 644 173 136 206 275  56  57  27  47
## [415]  88  83 284 132  94  91 106 132 178  51  69 166 243 197 180 200 233 100
## [433]   2 238  96  64 149  51 299  54  43  39 146  84 235  59  17  28 173 200
## [451]  80 144 184  82  58  12 131

16.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonUE <- (GeneSJ$chr2_25235826_25236935)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_CanonDE <- (GeneSJ$chr2_25234421_25235706)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_AG <- (GeneSJ$chr2_25235821_25236935)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

16.3 VAF

Mutated samples vaf:

16.4 Plots

16.4.1 Static Dot Plots

Canonical splice junction:

Splicing alterations:

16.4.2 Interactive Dot Plots

Canonical splice junction:

Splicing alteration:

16.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

Aceptor Loss:

Aceptor Gain:

16.5 Statistical Analysis

SJCounts <- GeneSJ

16.5.1 Aceptor Gain

16.5.1.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_AG[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_AG[SJCounts$GROUP == "WT"]
## W = 0.95965, p-value = 7.47e-10

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_AG[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.2731979

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_AG[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.6269592

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_AG - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 0.3537613

16.5.1.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:369] = -0.2732, -0.21764, -0.21332,  ..., 0.55227, 0.61613
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.9671053
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_AG")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1 - MUT_df$ECDF

MUT_df$Prediction <- "Aceptor Gain"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr2:25235821-25236935"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

16.5.2 Aceptor Loss

16.5.2.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonUE[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonUE[SJCounts$GROUP == "WT"]
## W = 0.9343, p-value = 2.758e-13

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonUE[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 7.623696

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonUE[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 9.090909

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonUE - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 1.467213

16.5.2.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:451] = -4.2983, -3.0101, -2.8542,  ..., 4.8763, 8.3485
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.9407895
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonUE")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- MUT_df$ECDF

MUT_df$Prediction <- "Aceptor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr2:25235826-25236935"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

16.5.3 Donor Loss

16.5.3.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonDE[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonDE[SJCounts$GROUP == "WT"]
## W = 0.91514, p-value = 2.545e-15

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonDE[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 5.868508

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonDE[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 4.806688

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonDE - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -1.06182

16.5.3.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:449] = -5.8685, -4.0264, -3.4704,  ..., 6.2884, 8.4172
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.1337719
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonDE")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- MUT_df$ECDF

MUT_df$Prediction <- "Donor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr2:25234421-25235706"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

17 DNMT3A chr2,25244214,G,A

Variant found in 1 patient of the BeatAML (1 sample)

  • Patients with DNMT3A chr2,25244214,G,A: 1 patient (1 sample)
  • Patients with the variant and RNASeq for validation: 1 patient (1 sample)

The splicing alterations being assessed are:

  • Donor Gain: predicted at 2bp from the variant, chr2:25244215, not found in the splice junction collection.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"DNMT3A_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="DNMT3A" & found_variants$MutationKey_Hg38 == "chr2,25244214,G,A",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

17.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

17.1.1 Donor Gain

Search: predicted at 2bp from the variant, chr2:25244215

Show all the splice junctions containing the positions between 25244210-25244219

colnames(GeneSJ)[grep("2524421",colnames(GeneSJ))]
## character(0)

Alternative SJ not found in the splice junction collection.

17.2 VAF

Mutated samples vaf:

18 DNMT3A chr2,25240693,C,A

Variant found in 1 patient of the BeatAML (1 sample)

  • Patients with DNMT3A chr2,25240693,C,A variant: 1 patient (1 sample)
  • Patients with the variant and RNASeq for validation: 1 patient (1 sample)

The splicing alterations being assessed are:

  • Donor Gain: predicted at 2bp from the variant, chr2:25240694, not found in the splice junction collection.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"DNMT3A_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="DNMT3A" & found_variants$MutationKey_Hg38 == "chr2,25240693,C,A",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

18.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

18.1.1 Donor Gain

Search: predicted at 2bp from the variant, chr2:25240694

Show all the splice junctions containing the positions between 25240690-25240699

colnames(GeneSJ)[grep("2524069",colnames(GeneSJ))]
## character(0)

Alternative SJ not found in the splice junction collection.

18.2 VAF

Mutated samples vaf:

19 KMT2D chr12,49050056,G,A

Variant found in 1 patient of the BeatAML (1 sample)

  • Patients with KMT2D chr12,49050056,G,A: 1 patient (1 sample)
  • Patients with the variant and RNASeq for validation: 1 patient (1 sample)

The splicing alterations being assessed are:

  • Donor Gain: predicted at 2bp from the variant, chr12:49050057, not found in the splice junction collection.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"KMT2D_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="KMT2D" & found_variants$MutationKey_Hg38 == "chr12,49050056,G,A",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

19.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

19.1.1 Donor Gain

Search: predicted at 2bp from the variant, chr12:49050057

Show all the splice junctions containing the positions between 49050050-49050059

colnames(GeneSJ)[grep("49050056",colnames(GeneSJ))]
## character(0)

Alternative SJ not found in the splice junction collection.

19.2 VAF

Mutated samples vaf:

20 KMT2D chr12,49031255,G,A

Variant found in 3 patients of the BeatAML (3 samples)

  • Patients with KMT2D chr12,49031255,G,A variant: 3 patients (3 samples)
  • Patients with the variant and RNASeq for validation: 3 patients (3 samples)

The splicing alterations being assessed are:

  • Donor Gain: predicted at 2bp from the variant, chr12:49031256, not found in the splice junction collection.
  • Donor Loss: chr12:49031034-49031174; donor splice site chr12:49031174.
  • Aceptor Gain: chr12:49031313-49032512, found in the mutated samples.
  • Aceptor Loss: chr12:49033965-49034066; aceptor splice site chr12:49033965.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"KMT2D_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="KMT2D" & found_variants$MutationKey_Hg38 == "chr12,49031255,G,A",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

20.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

20.1.1 Donor Gain

Search: predicted at 2bp from the variant, chr12:49031256

Show all the splice junctions containing the positions between 49031250-49031259

colnames(GeneSJ)[grep("4903125",colnames(GeneSJ))]
## character(0)

Alternative SJ not found in the splice junction collection.

20.1.2 Aceptor Gain

Search: chr12:49031313

colnames(GeneSJ)[grep("4903131",colnames(GeneSJ))]
## [1] "chr12_49031313_49031501" "chr12_49031313_49031996"
## [3] "chr12_49031313_49032512" "chr12_49031313_49032561"
## [5] "chr12_49031313_49033723" "chr12_49031313_49038762"
## [7] "chr12_49031313_49041300"

Show all the splice junctions containing the position chr12:49031313-49034067

colnames(GeneSJ)[grep("49031313_49034",colnames(GeneSJ))]
## character(0)
t(GeneSJ[GeneSJ$GROUP == "MUT",c("sample_id",colnames(GeneSJ)[grep("49031313",colnames(GeneSJ))])])
##                         63        127       434      
## sample_id               "BA2134R" "BA2302R" "BA3009R"
## chr12_49031313_49031501 "0"       "0"       "0"      
## chr12_49031313_49031996 "0"       "0"       "0"      
## chr12_49031313_49032512 "2"       "0"       "2"      
## chr12_49031313_49032561 "0"       "0"       "0"      
## chr12_49031313_49033723 "0"       "0"       "0"      
## chr12_49031313_49038762 "0"       "0"       "0"      
## chr12_49031313_49041300 "0"       "0"       "0"

Show all the splice junctions containing the position chr12:49031313-49032512

colnames(GeneSJ)[grep("49031313_49032512",colnames(GeneSJ))]
## [1] "chr12_49031313_49032512"

Found: chr12_49031313_49032512

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr12_49031313_49032512
##   [1]  3  1  8  0  0 10  8 11 10  9  2  9  0  1  3  3 15  4  1  7  2  9  0  0  2
##  [26]  0  4  0  0  6 22  1  6  5  1  1  2  0  0  3 13  9 10  5  0 10  2  1  3  2
##  [51]  4  8  4  2  0 10  1  0  8  5  1  1  2  4  1  0  3  0  9  2 10  2  3  6  1
##  [76]  5  2  3  1  2  7  7  2  2  1  7  4  1  0  5 10  6  1  0  4  4  2  3  5  5
## [101]  8 14  4  4  6  3  3  6  2  0  8  1  2 38  0  8 17 10  5  3  2  6  2  2  2
## [126]  5  0  5  2  6  1  3  2  4  1  4  3  0  2  0  0  8  0  0  4  8  0  0  2  6
## [151]  3 18  4  3 11  4  6  2  3  3  5  6  3  5 11  1  3  2  0  2  1  6  2  0  6
## [176]  3  2 10  2  0  8  1  0  3  4  1  4  0  5  0  8  3  2  2 16  0  4  3 19  8
## [201]  9  0  1  0  3  0 14  4  2  1  5  3  1  9  1  2  2  9 13  3  1  3 13  1 10
## [226]  5  1  5  0  2  1  6  0  9  9  2  1  0  1  7  3  0 12 10  4  1  4  0  4  4
## [251]  1  7  7  6  4  0  1  1  0  1  5  5  1 11 10  0  5 12  0  6  3  0  3  4  1
## [276]  2  5  1  4  8  7  4  3  4  0  1  1  0  6  1  3  0  2  0  0  6  0  5  2 18
## [301]  9  1  4  6 21  8  1  1  0  3  4  2  2  1  0  9  0  4  5  1  0  6  5  0 15
## [326]  8  0  6  4  1  3  1  2  3  2  0  8  8  0  7  4  0 13  9  9  9  4  5  1  6
## [351]  0  1  0  1  0  1  1  1  5  4 11  1  2  0  1  7  3  2  0  9  2  3  2  0  1
## [376]  5  1  0  7  6  0  1  1  7  7  0  1  7  3  5  2 14  8  0  0  2 10  9  9  1
## [401]  7  0  1  5  2 14  0  1  8  2 13  3  2  1  4  2  6  2  8 21  8  4  3  0  4
## [426]  8  1  6  5 27  6  4  0  2  1  2  3  0  4  3  1  4 10  4  4  1  1  2  4  9
## [451]  1  5  8  4  4  0  6

Samples with the SJ of interest:

table(GeneSJ$chr12_49031313_49032512>0) 
## 
## FALSE  TRUE 
##    79   378

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr12_49031313_49032512 > 0])
## 
## MUT  WT 
##   2 376

Alternative SJ found in the mutated samples.

20.1.3 Canonical SJ

Exon upstream (UE): chr12:49033965-49034066; aceptor splice site chr12:49033965

Exon downstream (DE): chr12:49031034-49031174; donor splice site chr12:49031174

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr12_49033965_49034066
##   [1] 251 127 541  10 219 303 337 292 165 499 140 151 173 225 129 176 248 213
##  [19] 104 148 242 414 126 269 246  82 206   1 257 145 352 180 183 154 271 173
##  [37] 142 232 117 262 303 166 167 144  22 315  21 194 251  98 134 274 165 352
##  [55] 345 240 218 111 182 205 364 313  73 122 105 125 127  82 218 244 301  95
##  [73] 219 210 209 260 409 256 145 144 244 140 369 140 195 221 110  63 178 171
##  [91] 383 353 341 330 311 126 121 193 180 288 386 300 149 140 115 136 201 100
## [109] 146 158 222 162 227 398  62 223 397 242 205 114 350 139 180 206 331 317
## [127]  84  99 171 275 131 249 104 212  82 169 161 145 102  75 102 208 412 169
## [145] 155 392  72 259 158 340 128 507 358 146 390  77 168 231  97 144 169 216
## [163] 185 290 267  85 368 163 162 108  84 259  91  92 321 160 289 235 152 123
## [181] 315 126 318 149 281  83 336  98 147 395 275 351 273 140 259 160 283 188
## [199] 357 194 196 107 533 212 100 101 310 263  71 112 278  91 123 375 183 141
## [217]  70 254 158  59 186 154 272 195 308 134 100  82  83 354 161 407 195 103
## [235] 184 173  94 280  62 247 341  89 336 290 268 311  91   0 134 102 178 152
## [253] 184 243  90 129 317 213 253  90 163 290 128 227 317 164 366 167 115 190
## [271] 298 118 131 189 163 104 134 172 200 308  92 196 110 242  50 215  86 233
## [289] 252 109 284 252 164  38  68 155 194 227  84 298 213 534  97 163 198 321
## [307]  97 115 157 144 207  41 125  77 164 178  78 223 329 276 216 271 153  95
## [325] 267 243 126 205  69 149 209  56 188 166 134  71 350 171  62 139 165 213
## [343] 199 303 121 331 143 283 126 192 178 215 197 173  83 226 126 160 151 120
## [361] 379 137 284 192 102 107 178 191 329 107 121 207 139 105 458 174 231 395
## [379] 207 157   1 131  87 101 334 220 111 161 142 158 210 284 222  17  84 136
## [397] 325 268 112 217 240 395 138 192 110 310  54 249 322 221 192 110 113  96
## [415] 101 205 274 218 211 261 305 256 365 187  87 153 233 338 296 337 353  95
## [433]   0 144  93 132 131  53  90 217 167 200 267 206 157 202   6 182 188 187
## [451] 249 405 409 104 222 194 214

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr12_49031034_49031174
##   [1] 203  54 416   9 128 269 240 143 134 345 142 128 165  82  94 138 214 137
##  [19] 100 121 113 236  81 137 109  64 124   6 219 172 256  92 145 141 204 135
##  [37]  96  84  94 110 247 123 160 110  24 156  26 113 138  70  91 118 109 242
##  [55] 194 169 117  78 142 148 172 104  63  65  93  74  88  60 169 199 223  62
##  [73] 106 145 124 178 195 165  73  99 158 116 163 116 107 161  60  39 135 100
##  [91] 235 219 171 157 166  88 102 201 146 199 256 193 120  88  73 128 133  78
## [109]  70  84 150  85 116 319  60 159 278 160 138 113 143  93 138 130 186 142
## [127]  57  99 103 175  91 142  86 139  38 126 124  79  83  56  59 181 173  98
## [145] 141 136  61 130 138 260  97 338 213  97 265  52 136 148  71 106 125 153
## [163] 138 170 178  89 177 102 133 102  73 190  52  58 199 117 214 160 136 103
## [181] 198 107 211  77  91  54 233  69 141 214 169 164  73 132 213  90 181 172
## [199] 220 209 149  56 246 150 100  68 199 193  48  93 158  89  68 260 108  76
## [217]  55 196 110  41 148  93 159 130 220 132 107  75  81 240 125 234 104  96
## [235] 126 121 124 190  43 151 209  72 185 226 173 118  72   3  88  80  98 105
## [253] 109 220  81 146 189 183 166  85 158 184  87 198 193  88 236 116  95 129
## [271] 185  76 143 128 126 106 141 158 148 233  90 141 109  90  60 160  57  89
## [289] 178  85 163 121 123  46  63 134  52 208  90 189 164 267  73 119 161 188
## [307]  52  83 110 112 138  39  87  79 129 135  65 155 200 158 132 202 107  78
## [325] 211 224 107 155  75 143 189  48  75 116  95  51 257 112  44 123 104 150
## [343] 121 190 127 243 129 213  58 160 111 111 111 112  63 125 116 126 110  96
## [361] 254  80 124 115  72 109 111 118 169  81 120 115 125  89 165 101 174 212
## [379] 158 146   0 108  76  78 169 181  98 165 136 167 172 198 163  21  64 149
## [397] 261 194 107 126 184 233  96 134  64 231  91 137 204 135 152  92 135  52
## [415]  83 103 181 154 144 144 184 173 204 103  78 113 105 208 194 247 177  80
## [433]   2  89  55  87  87  42  99 101  97 138 211 148 136 144  10 132 162 174
## [451] 151 288 204  82 106  87 145

20.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonDE <- (GeneSJ$chr12_49031034_49031174)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_CanonUE <- (GeneSJ$chr12_49033965_49034066)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_AG <- (GeneSJ$chr12_49031313_49032512)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

20.3 VAF

Mutated samples vaf:

20.4 Plots

20.4.1 Static Dot Plots

Canonical splice junction:

Splicing alterations:

20.4.2 Interactive Dot Plots

Canonical splice junction:

Splicing alteration:

20.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

Aceptor Loss:

Donor Loss:

Aceptor Gain:

20.5 Statistical Analysis

SJCounts <- GeneSJ

20.5.1 Donor Loss

20.5.1.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonDE[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonDE[SJCounts$GROUP == "WT"]
## W = 0.98286, p-value = 3.366e-05

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonDE[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 2.346576

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonDE[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 2.458057 1.932859 2.051164

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonDE - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1]  0.1114811 -0.4137173 -0.2954120

20.5.1.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:452] = -2.3466, -1.4753, -1.2799,  ..., 1.4057, 1.6941
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.6013216 0.1696035 0.2422907
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonDE")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- MUT_df$ECDF

MUT_df$Prediction <- "Donor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr12:49031034-49031174"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

20.5.2 Aceptor Loss

20.5.2.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonUE[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonUE[SJCounts$GROUP == "WT"]
## W = 0.91571, p-value = 3.157e-15

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonUE[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 3.337412

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonUE[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 2.848225 2.848423 3.318737

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonUE - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -0.48918776 -0.48898930 -0.01867546

20.5.2.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:451] = -3.3374, -3.0627, -2.6438,  ..., 1.2966, 1.3287
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.1519824 0.1519824 0.4713656
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonUE")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- MUT_df$ECDF

MUT_df$Prediction <- "Aceptor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr12:49033965-49034066"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

20.5.3 Aceptor Gain

20.5.3.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_AG[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_AG[SJCounts$GROUP == "WT"]
## W = 0.89952, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_AG[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.06900696

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_AG[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.07803355 0.00000000 0.04609357

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_AG - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1]  0.009026599 -0.069006955 -0.022913385

20.5.3.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:374] = -0.069007, -0.061605, -0.06126,  ..., 0.27096, 0.2882
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.6167401 0.1718062 0.4317181
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_AG")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1 - MUT_df$ECDF

MUT_df$Prediction <- "Aceptor Gain"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr12:49031313-49032512"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

21 TET2 chr4,105259774,G,A

Variant found in 1 patient of the BeatAML (1 sample)

  • Patients with TET2 chr4,105259774,G,A variant: 1 patient (1 sample)
  • Patients with the variant and RNASeq for validation: 1 patient (1 sample)

The splicing alterations being assessed are:

  • Aceptor Gain: predicted at 38bp from the variant, chr4:105259811, not found in the splice junction collection.
  • Donor Loss: chr4:105259770-105261758; donor splice site chr4:105259770.
  • Donor Gain: chr4:105259708-105261758, found in the mutated samples.
  • Exon Skipping: chr4:105243779-105261758, not found in the splice junction collection.

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"TET2_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="TET2" & found_variants$MutationKey_Hg38 == "chr4,105259774,G,A",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

21.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

21.1.1 Aceptor Gain

Search: predicted at 38bp from the variant, chr4:105259811

Show all the splice junctions containing the positions between 105259810-105259819

colnames(GeneSJ)[grep("10525981",colnames(GeneSJ))]
## character(0)

Alternative SJ not found in the splice junction collection.

21.1.2 Donor Gain

Search: chr4:105259708-105261758

Show all the splice junctions containing the position chr4:105259708-105261758

colnames(GeneSJ)[grep("105259708_105261758",colnames(GeneSJ))]
## [1] "chr4_105259708_105261758"

Found: chr4_105259708_105261758

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr4_105259708_105261758
##   [1]  1  1  0  0  0  0  0  1  0  1  1  0  0  0  0  2  0  1  0  0  0  0  0  0  0
##  [26]  0  0  0  0  0  1  1  0  0  2  0  0  0  0  0  1  0  1  0  0  0  0  0  0  0
##  [51]  0  0  0  0  0  0  0  0  0  0  1  0  2  0  3  0  0  0  0  0  0  0  0  3  0
##  [76]  0  0  0  0  0  0  0  2  2  0  0  0  2  0  0  0  0  0  0  0  0  1  0  0  1
## [101]  0  0  0  0  2  1  0  0  0  0  0  2  0  0  0  2  0  0  0  0  0  1  0  0  0
## [126]  0  0  0  0  0  0  0  0  0  0  2  0  0  1  0  0  0  0  0  0  0  2  0  0  0
## [151]  0  0  0  0  2  2  0  0  0  0  0  0  3  0  0  0  0  0  0  0  0  0  0  0  0
## [176]  0  0  0  0  0  0  0  0  1  0  0  1  0  0  0  0  0  1  0  1  0  0  0  0  0
## [201]  1  0  0  0  0  0  1  0  0  0  0  0  0  2  0  0  0  0  1  1  0  0  0  0  0
## [226]  0  1  0  0  0  1  0  0  0  1  0  0  0  1  0  3  0  0  4  0  0  0  0  0  0
## [251]  0  1  0  0  1  0  0  0  0  0  1  0  0  1  0  0  0  2  0  0  0  0  0  0  0
## [276]  0  0  0 31  0  0  0  0  0  0  0  0  0  1  0  1  0  0  0  0  2  0  0  0  1
## [301]  0  0  0  1  0  0  0  0  0  0  1  0  0  0  1  0  0  0  0  0  0  2  1  0  1
## [326]  0  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  1
## [351]  0  0  0  0  0  0  0  0  1  0  1  0  0  0  0  0  0  0  0  1  0  0  1  0  0
## [376]  1  0  0  3  0  0  0  0  0  2  0  0  0  0  0  0  1  0  0  0  0  0  1  0  2
## [401]  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  2  0  1  0  0  0  0
## [426]  0  0  0  0  1  0  0  0  0  0  0  0  0  0  1  0  0  4  0  0  0  0  0  0  0
## [451]  0  0  0  0  0  0  0

Samples with the SJ of interest:

table(GeneSJ$chr4_105259708_105261758>0) 
## 
## FALSE  TRUE 
##   374    83

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr4_105259708_105261758 > 0])
## 
## MUT  WT 
##   1  82

Alternative SJ found in the mutated samples.

21.1.3 Exon Skipping

Search: chr4:105243779-105261758

Show all the splice junctions containing the position chr4:105243779-105261758

colnames(GeneSJ)[grep("105243779_105261758",colnames(GeneSJ))]
## character(0)

Alternative SJ not found in the splice junction collection.

21.1.4 Canonical SJ

Exon 6-7 chr4:105243779-105259618

Exon 7-8 chr4:105259770-105261758; donor splice site chr4:105259770

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr4_105243779_105259618
##   [1]  46  13  84   1  13   1  20  33  34  45  66  44  19   8  14  36 101  61
##  [19]  15  37   3  31  15  49   9   7  54   1   9  71  60  33  60  54  41  17
##  [37]  14  16  17   7  59  23  64  23   8  29   6  15  27  39   9  20  68  28
##  [55]  72  32  12  10 100  45  29   3  32   3  73  18  19   9  22  41  29   5
##  [73]   4  63  19  61   1  54  11  45  24  50   8  52   9  19   3  20  47  29
##  [91]  84  30  30  22  10 109  45  71  71  52  51  41  23   6  43  55   9  45
## [109]  16  28  35  56   8  19  17  55  33  19  57  22   8  28  44  48  50  45
## [127]   5  55  27  10  63  38  46  25  40  47  18  21  12  22  16  52  13   5
## [145]  15  24  13  17  56  38  74 107  28  27  43  19  76  30   7  25  66  79
## [163]  45  17  18  44   4   7  29  20  50  43  17  16  29  11  39  34  53  13
## [181]  25  21  16  51  15  11  39   1  59  13   7  17  14  24  60  15  80  51
## [199]  46  63  39   6  16  22  72  10  43  51  51  31   9  33   9 106  19   9
## [217]   7  55  30  24  13  17  54  23  27  12  72  15  38  62  54  12  16  20
## [235]  33  37  19  13  73  21  50  13  37  57  71   7  17   3  21  11  12  30
## [253]  58 101  14  50  57  88  35  14  49  21  35  43  41  16  44  18   9  29
## [271]  39   7  53  34  42  39  26  39  51  12  35  38  30   9   9  72   7   5
## [289]  33  13  23  32  52  24  21  74  41  46   6  74  18  10  17  32  14  64
## [307]  36  12  13  12  58   9  68  10  64  48   9  39  27  24  29  99  25   3
## [325]  66  15 101  59  16  43  54  25   9  16  13  21  24  20  15  30  27  17
## [343]  76  30  14  22  34  43   5  33  21  20  63  29   3   7   4  28  26   4
## [361]  20   6  42   3  10  37  35  20  12  42  29  16  54  13  23  44  47   5
## [379]  55  16   1  28  12  18  46  54   3  34  26  50  19  29  49   1   3  31
## [397]  33  38  18  66  55  25  34  34  13  20   7  14  35  29  76  14  81   9
## [415]  23  23  31  32  59   1  23  29   4  21  27  12   7  13  28  91  33  28
## [433]   0  17  15  37  23  16  15  32   6  17 106  24  17  77  15  36  56  45
## [451]  62  52  41   6   8   2  55

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr4_105259770_105261758
##   [1] 114  14 222   2  68   7  76  96  71 199 139  69  53  32  73  70 175 136
##  [19]  45  67  28  58  36 127  22  17 146   3  58 123 171  98  99  84 205  60
##  [37]  65  61  41  53 160  68  95  39  26  94  15  58  65  94  23  55 133  92
##  [55] 189 119  16  44 263  89  89  47  71  42 122  44  50  12  50 100  66  31
##  [73]  36 157  30 178  18 209  36  59  74  97  68 112  48  62  17  26  93  66
##  [91] 239  70 122  60  85 226 110 146 126 113 151 146  58  31 108  97  33  74
## [109]  57  67 169 135  32  86  75 106 132  52 152  58  44  43  87 138 143 153
## [127]  18 100  90  35 168  73  75 150  77  90  43  82  40  67  35 128  43  19
## [145]  41  64  38  43  99  60 175 161 111  55 210  53 208  81  23 107 140 198
## [163] 102  62 116  82  15  33  55  33 132 116  64  35  82  36  99  68 103  31
## [181]  76  64  55  62  60  21 165   5  73  65  59  62  51  57 127  36 148 108
## [199] 145 102  82  26  50  56  86  45 160 100 109  79  42  49  29 135  40  35
## [217]   8 141  66  57  40  39 165  44  73  43 142  37  50 150  55  39  62  70
## [235]  97  92  62  33 115  65 106  61  72 153 208  39  53   4  69  30  34  74
## [253]  98 198  33 139 162 189  92  17 103  71  81  86 125  76 125  48  21  78
## [271]  88  34 102 125 151  54 103  85  71  55  62  82 116  57  23 145  27  50
## [289] 152  59 103 114 116  50  39 110  67  69  16 138  39  37  78  77  36 186
## [307]  70  35  24  32 128  20 170  18  98  59  39  98  85  90  69 171 110  12
## [325] 130  65 183 174  68 139 106  67  29  29  54  42  76  38  54  83  86  88
## [343] 137 126  26  97  92  88  31  78  88  68 135  78  19  23  15  59 114  69
## [361]  67  31 100  31  36  70  67  45  95  87  55  85  81  62 120 151 152  52
## [379] 120  71   0  69  36  66  92 125  20  73  50  97  34  67 183   0  39  62
## [397]   0  87  45 214 106 167  75  59  47  58  15  59 119  46 108  42  92  31
## [415]  39  77  81  79 148   7 113  46  67  50  62  43  36  80  78 163 142  41
## [433]   0  67  74  93  49  30  22  86  29  47 214  87  39 122  22  67 103  96
## [451] 118 133 120  25  43  28 115

21.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonEx7_8 <- (GeneSJ$chr4_105259770_105261758)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_Ex7DG <- (GeneSJ$chr4_105259708_105261758)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

21.3 VAF

Mutated samples vaf:

21.4 Plots

21.4.1 Static Dot Plots

Canonical splice junction:

Splicing alterations:

21.4.2 Interactive Dot Plots

Canonical splice junction:

Splicing alteration:

21.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

Donor Loss:

Donor Gain:

21.5 Statistical Analysis

SJCounts <- GeneSJ

21.5.1 Donor Loss

21.5.1.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonEx7_8[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonEx7_8[SJCounts$GROUP == "WT"]
## W = 0.91829, p-value = 5.209e-15

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonEx7_8[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 10.47262

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonEx7_8[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 7.802198

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonEx7_8 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -2.670419

21.5.1.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:449] = -10.473, -9.7634, -7.9903,  ..., 6.9692, 20.297
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.1469298
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonEx7_8")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- MUT_df$ECDF

MUT_df$Prediction <- "Donor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr4:105259770-105261758"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

21.5.2 Donor Gain

21.5.2.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_Ex7DG[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_Ex7DG[SJCounts$GROUP == "WT"]
## W = 0.44616, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_Ex7DG[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.03142559

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_Ex7DG[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 3.406593

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_Ex7DG - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 3.375168

21.5.2.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:81] = -0.031426, 0.024881, 0.033383,  ..., 0.51502, 0.57833
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 1
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_Ex7DG")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1 - MUT_df$ECDF

MUT_df$Prediction <- "Donor Gain"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr4:105259708-105261758"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

22 PTPN11 chr12,112489084,G,T

Variant found in 3 patients of the BeatAML (3 samples)

  • Patients with PTPN11 chr12,112489084,G,T variant: 3 patients (3 samples)
  • Patients with the variant and RNASeq for validation: 2 patients (2 samples)

The splicing alterations being assessed are:

  • Donor Gain: chr12:112489079-112502143, found in the mutated samples.
  • Donor Loss: chr12:112489176-112502143; donor splice site chr12:112489176

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"PTPN11_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="PTPN11" & found_variants$MutationKey_Hg38 == "chr12,112489084,G,T",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

22.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

22.1.1 Donor Gain

Search: chr12:112489079-112502143

colnames(GeneSJ)[grep("112489079",colnames(GeneSJ))]
## [1] "chr12_112489079_112502143" "chr12_112489079_112502148"
## [3] "chr12_112489079_112504694" "chr12_112489079_112505824"

Show all the splice junctions containing the position chr12:112489079-112502143

colnames(GeneSJ)[grep("112489079_112502143",colnames(GeneSJ))]
## [1] "chr12_112489079_112502143"

Found: chr12_112489079_112502143

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr12_112489079_112502143
##   [1]  0  0  0  1  0  0  0  1  0  2  0  0  1  0  0  1  1  1  3  0  2  0  1  3  0
##  [26]  1  0  0  3  0  0  5  0  0  1  3  0  1  0  1  0  0  2  2  0  2  0  1  0  0
##  [51]  0  0  0  0  2  1  0  1  2  0  1  1  2  0  0  0  0  0  0  0  0  0  5  0  0
##  [76]  0  0  1  0  1  2  0  2  0  0  1  0  0  0  1  3  2  1  1  2  1  0  0  0  0
## [101]  0  0  0  0  0  1  1  0  0  0  0  0  3  1  0  0  0  1  2  0  0  0  0  2  1
## [126]  1  2  1  1  3  1  1  0  0  0  2  1  0  0  0  0  1  0  0  0  0  0  1  1  1
## [151]  0  0  0  0  0  2  0  0  1  0  0  1  1  0  1  0  0  3  0  0  0  0  2  0  0
## [176]  0  1  0  0  0  0  0  3  2  1  1  2  0  0  4  1  3  0  0  0  2  0  5  0  0
## [201]  0  1  0  1  0  0  0  0  0  3  2  1  1  1  0  0  0  0  0  1  2  0  1  1  2
## [226]  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  3  0  1  1  0  2  0  0  0
## [251]  0  2  0  0  0  1  0  0  1  0  0 10  0  0  0  1  1  3  2  4  5  1  0  0  0
## [276]  0  0  0  0  1  2  0  2  1  2  0  0  5  1  0  2  2  0  0  0  0  0  0  0  0
## [301]  0  1  0  0  1  0  2  0  0  0  1  0  0  0  0  0  0  0  1  2  0  1  2  0  2
## [326]  0  0  0  1  0  1  0  1  0  1  0  0  0  2  3  0  0  2  0  0  0  0  0  2  0
## [351]  0  2  1  3  2  0  1  0  0  2  0  2  2  0  0  0  1  5  1  2  0  0  0  2  0
## [376]  1  0  1  0  1  0  0  0  3  0  0  0  0  0  0  0  0  2  0  3  0  0  4  3  0
## [401]  0  0  1  0  3  0  1  2  3  0  0  0  0  0  0  0  0  2  0  3  0  1  1  3  0
## [426]  0  0  2  0  0  0  0  0  2  0  0  0  2  2  1  1  0  0  0  0  0  0  1  0  0
## [451]  0  0  1  0  0  1  0

Samples with the SJ of interest:

table(GeneSJ$chr12_112489079_112502143>0) 
## 
## FALSE  TRUE 
##   282   175

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chr12_112489079_112502143 > 0])
## 
## MUT  WT 
##   2 173

Alternative SJ found in the mutated samples.

22.1.2 Canonical SJ

Exon 12-13: chr12:112488511-112489023

Exon 13-14: chr12:112489176-112502143; splice site chr12:112489176

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr12_112488511_112489023
##   [1] 156  53 242   3  96 236 300 249  93 150 247 148 119 163 191 256 183 363
##  [19] 170 138 207 224 102 175 136  99 121   1 101 216 307 107 164 134  95 109
##  [37] 113 123  89 237 263 222 163 202  58 220  70  81 111  75  84  85  78 275
##  [55] 318 282  56 253 126 195 195  70 101 185  88 116 241 110 133 196 339  88
##  [73] 259 175 133 206 196 353  92 240 266 103 203 165 124 197  62 144  96 129
##  [91] 362 196 277 184 105 196 148 133  92 325 118 205 260 147 200 206 184 126
## [109] 137  75 209 110 170 368  73 248 388 187  74 132 124 139 153 209 398 126
## [127]  99 115 198 127 132 129  80 227 113 111  98  79 159  37  58 210 210  85
## [145] 145 171  58 134 232 191 133 448 157 135 211 136 285 204  81 187 159 192
## [163]  90 150 317  80 119  76 133 115 106 252 136  81 112 213 471 242  98 184
## [181] 183 200 415  94 107  98 157  45 148 169 203 133 148  67 258  55 138 170
## [199] 256 132 160  99 192 238 127 159 317 244 136 151 131 130  48 204 122 103
## [217]  68 216 149 133  86 113 132 192 220 149 126  77 138 203 164 125 102  57
## [235] 205 134  47 217  83 198 190 201 218 207 143 137 152   2 161 113 158  83
## [253] 164 155 142 194 112 140 278  76 143 162 156 186 168 159 195 159 105 261
## [271] 391 113 160 228 179 131 104 121 135 192 173 145 103 206  49 213  52 102
## [289]  71  91 147 447 204 101 117 146 296 149  94 238 299 298  93 163 190 217
## [307]  75  82  96  75 177  31 159  80 167  88  86  78 219 133  83 133  94  64
## [325] 156 245 153 107 208  48 184 137 145 105 155 178 230 190 104  84  89 217
## [343] 291 349 189 199 127 240 121 176  73 178 248 114  99 103  89 132 166 129
## [361] 317  92 180  78 107 133 116 135 182 180 191 190 175  82 129 156 164 280
## [379] 199 210   2  79  51  71  45 191 167 164 163  95 206 170 150  11 118 153
## [397]  83 173 115 167 147  97  72 153  79 456  63 159 279 171 230  60 102  38
## [415] 189 114 243 154 161 167 142 191 222 174 114 131 110 233 169 208 180 105
## [433]   1 166 136 115 143 122 171  65  91 197 240  84 207  70   9  94 216 233
## [451] 118 229 201 120  95  65 242

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chr12_112489176_112502143
##   [1] 274  65 339   1 120 419 538 394 139 260 375 238 196 219 323 412 222 574
##  [19] 231 206 163 389 144 174 162 154 193   3 125 249 467 181 235 143 118 160
##  [37] 178 226 148 323 446 356 260 268  67 353  27  96 174 103 113 128  86 499
##  [55] 419 424  80 281 239 292 236 119 132 242 167 184 370 209 215 301 473 157
##  [73] 296 347 159 354 277 605 130 374 411 153 319 283 183 287  84 210 126 204
##  [91] 585 317 426 329 149 333 175 182 128 505 193 310 322 191 355 263 301 198
## [109] 159 113 312 212 297 504  99 399 580 249 162 185 136 161 207 322 638 221
## [127] 136 179 327 212 222 214 127 303 184 148 139 115 233  46  77 315 286 136
## [145] 186 210 105 200 383 295 197 623 261 218 324 258 421 300 110 288 223 237
## [163] 111 305 361 105 173 128 191 183 110 454 198 123 173 279 759 311 164 217
## [181] 285 252 607 123 197 178 240  66 213 248 303 162 215 103 351  82 198 219
## [199] 360 152 217 165 319 232 200 239 413 420 223 188 218 171  74 345 128 108
## [217]  98 359 222 170  95 135 235 268 359 186 231 126 220 318 211 199 170  96
## [235] 299 187  50 296 105 302 248 259 326 355 243 218 154   3 327 145 183 140
## [253] 234 310 185 237 188 189 392 102 228 209 197 324 269 243 377 280 128 420
## [271] 448 144 211 333 296 177 194 154 251 271 347 174 181 294  69 259  67 177
## [289]  84 113 192 295 288 117 190 195 233 268 121 153 419 398 130 206 361 318
## [307] 130  93 126 100 284  63 236 157 233 156 137 127 364 183 150 212 137  76
## [325] 276 372 153 179 341  81 220 196 206 172 204 245 391 307 181 125 155 309
## [343] 201 616 229 262 185 403 178 215 111 346 272 182 124 115 122 173 283 228
## [361] 428 105 233 114  88 221 203 235 270 226 186 353 208 106 151 286 206 318
## [379] 343 273   1 124  96  87  66 313 317 207 285 139 307 265 253   2 157 195
## [397] 104 284 208 225 240 149 119 205  92 708 103 231 391 273 346  63 125  54
## [415] 252 115 296 278 223 264 217 216 277 248 185 206 114 368 233 407 307 147
## [433]   0 271 217 168 189 141 212  91 114 241 289 160 303 122  14 120 265 358
## [451] 208 289 301 164 135  70 335

22.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonEx13_14 <- (GeneSJ$chr12_112489176_112502143)/GeneSJ$rowSum_SJtotal*100

GeneSJ$Normalized_Ex13DG <- (GeneSJ$chr12_112489079_112502143)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

22.3 VAF

Mutated samples vaf:

22.4 Plots

22.4.1 Static Dot Plots

Canonical splice junction:

Splicing alterations:

22.4.2 Interactive Dot Plots

Canonical splice junction:

Splicing alteration:

22.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

Donor Loss:

Donor Gain:

22.5 Statistical Analysis

SJCounts <- GeneSJ

22.5.1 Donor Loss

22.5.1.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonEx13_14[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonEx13_14[SJCounts$GROUP == "WT"]
## W = 0.91555, p-value = 2.918e-15

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonEx13_14[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 8.050041

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonEx13_14[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 7.403471 7.817386

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonEx13_14 - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -0.6465693 -0.2326549

22.5.1.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:452] =  -8.05, -7.3458, -6.9631,  ..., 4.0852,  11.95
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.3296703 0.4483516
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonEx13_14")] 
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- MUT_df$ECDF

MUT_df$Prediction <- "Donor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chr12:112489176-112502143"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

22.5.2 Donor Gain

22.5.2.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_Ex13DG[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_Ex13DG[SJCounts$GROUP == "WT"]
## W = 0.53459, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_Ex13DG[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.02726744

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_Ex13DG[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.3542331 0.1876173

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_Ex13DG - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 0.3269656 0.1603498

22.5.2.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:172] = -0.027267, -0.013522, -0.013448,  ..., 0.2591, 0.67696
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.9978022 0.9846154
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_Ex13DG")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1 - MUT_df$ECDF

MUT_df$Prediction <- "Donor Gain"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chr12:112489079-112502143"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

23 KDM6A chrX,45062737,C,T

Variant found in 2 patients of the BeatAML (2 samples)

  • Patients with KDM6A chrX,45062737,C,T variant: 2 patients (2 samples)
  • Patients with the variant and RNASeq for validation: 2 patients (2 samples)

The splicing alterations being assessed are:

  • Donor Gain: predicted a 2bp from the variant, chrX:45062740, not found in the splice junction collection.
  • Donor Gain: chrX:45062720-45063421, found in the mutated samples.
  • Donor Loss: chrX:45062749-45063421; donor splice site chrX:45062749.

Variant information:

Load the extracted splice junctions of the gene harboring the mutation.

extractedSJ_path <- paste0(extractedSJ_dir_in,"KDM6A_UM_annotSJ.tsv")
GeneSJ <- read.delim(extractedSJ_path, sep ="\t")

Set the sample’s group: Mutated (MUT) or No Mutated (WT)

samples_df <- found_variants[found_variants$Gene=="KDM6A" & found_variants$MutationKey_Hg38 == "chrX,45062737,C,T",]

cases <- samples_df$RNA_Sample[samples_df$Validable == "Validable"]
GeneSJ$GROUP <- ifelse(GeneSJ$sample_id %in% cases , "MUT", "WT")

23.1 SJ Lookup

Search for the splice junctions of interest in the extracted splice junctions of the gene by position (chr_SJstart_SJend).

23.1.1 Donor Gain

Search: predicted a 2bp from the variant, chrX:45062740

Show all the splice junctions containing the positions between 45062740-45062749. Found canonical donor: chrX:45062749

colnames(GeneSJ)[grep("4506274",colnames(GeneSJ))]
## [1] "chrX_45062749_45063379" "chrX_45062749_45063392" "chrX_45062749_45063421"
## [4] "chrX_45062749_45063434"

Show all the splice junctions containing the positions between 45062730-45062739.

colnames(GeneSJ)[grep("4506273",colnames(GeneSJ))]
## character(0)

Alternative SJ not found in the splice junction collection.

23.1.2 Donor Gain

Search: chrX:45062720-45063421

colnames(GeneSJ)[grep("4506272",colnames(GeneSJ))]
## [1] "chrX_45062720_45063392" "chrX_45062720_45063421" "chrX_45062720_45063434"
## [4] "chrX_45062724_45063421"

Show all the splice junctions containing the position chrX:45062720-45063421

colnames(GeneSJ)[grep("45062720_45063421",colnames(GeneSJ))]
## [1] "chrX_45062720_45063421"

Found: chrX_45062720_45063421

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chrX_45062720_45063421
##   [1] 0 0 0 0 0 0 0 1 1 0 0 0 0 2 0 2 0 0 0 0 0 0 1 5 0 0 0 0 2 0 0 1 0 0 2 2 2
##  [38] 1 0 4 0 0 0 0 0 3 0 0 0 2 2 4 1 3 5 0 1 3 1 0 1 3 0 1 0 0 0 1 1 0 0 1 2 0
##  [75] 0 0 7 3 0 0 1 0 3 2 2 1 0 1 2 0 3 5 5 4 0 0 0 0 1 5 5 0 1 2 2 0 2 2 2 2 0
## [112] 1 0 2 0 0 0 0 1 3 0 7 0 5 2 5 1 1 0 0 3 0 0 2 0 0 0 1 1 3 1 0 1 0 0 4 0 3
## [149] 0 0 1 1 0 0 2 0 2 1 0 5 0 0 0 2 2 0 1 1 0 1 6 4 0 0 1 0 2 4 2 0 0 0 1 0 0
## [186] 0 1 0 0 0 3 3 3 0 1 1 0 1 0 0 0 0 1 0 0 1 0 1 1 0 1 0 1 1 5 1 4 0 0 0 0 0
## [223] 4 1 3 3 6 0 0 0 0 0 4 2 2 3 0 2 0 0 0 1 1 2 0 0 2 0 1 3 3 5 1 1 2 4 4 0 0
## [260] 0 0 0 2 1 0 0 0 1 1 0 0 1 2 0 2 2 0 0 0 0 0 0 1 0 3 0 2 2 2 0 0 0 1 2 2 0
## [297] 0 0 0 0 2 0 3 0 0 0 4 2 0 2 0 0 6 0 8 0 1 0 3 3 0 1 3 0 0 0 0 2 0 1 0 2 1
## [334] 2 1 0 0 1 4 2 5 1 0 2 0 0 1 0 0 3 3 1 3 0 0 0 0 0 0 0 1 0 2 0 2 2 2 2 3 1
## [371] 1 4 1 0 1 3 0 0 1 3 0 2 0 0 0 2 1 0 0 2 1 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0
## [408] 3 0 4 0 0 1 2 0 0 2 0 1 0 0 2 0 1 0 3 1 1 0 0 0 3 0 2 0 0 3 1 1 2 0 0 0 0
## [445] 0 1 0 1 4 0 1 1 0 0 1 0 0

Samples with the SJ of interest:

table(GeneSJ$chrX_45062720_45063421>0) 
## 
## FALSE  TRUE 
##   235   222

Groups of the samples having the alternative splice junction:

table(GeneSJ$GROUP[GeneSJ$chrX_45062720_45063421 > 0])
## 
## MUT  WT 
##   2 220

Alternative SJ found in the mutated samples.

23.1.3 Canonical SJ

Exon upstream (UE): chrX:45061420-45062646

Exon downstream (DE): chrX:45062749-45063421; donor splice site: chrX:45062749

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chrX_45061420_45062646
##   [1]  40  35  93   5 101  34  52  61  18 120  41  28  95 116  28  48  44  72
##  [19] 124  12 120  60  49 125  96  52  76   0  96  71  47 130  28  37  93  61
##  [37]  43  81  43 231  62  40  47  41  29 140  13  63  74  31  94 307  59  43
##  [55] 173  34 150  73  52  52 160 200  22 127  47  67  36  54  34  61  56  53
##  [73]  71  72  39 152 364 107 134   7  50  19 135  61  82  37  74  38  49  48
##  [91] 161 108 287 223 113  63  41  45  19  73 201  48  33  31  24  50  69  31
## [109]  88  93 105  49  75  61  50  63  84  25  42  45  85  63  43 102  58 163
## [127] 124  18  45  67  76  81  22 119  39  16  77 116  58  67  42  46 309  45
## [145]  33 219  17 128 132  49  29 205 214  72  57  60 101  96  47  49  45 159
## [163]  27  71  76  29  87  57  15  66  74 107  44  62  41  81  65  80  56 158
## [181]  65  40  45  46  99  47  77  37  61 198  67 156 208  40  26 110  40  44
## [199]  42  15  50  52 138  80  22  61  81  66  92  44  49  56  85  61 136  56
## [217]  49  85  29  85 144  97  77 184  81  53 126  59  30  76  32 187 164  34
## [235]  43  68  72  62  25  72  52  56  44  55  66 239 115   0  45  76 184 127
## [253]  76  63  35  94  91  49  31  12  35  36  64  67  48  81  48  40  66  65
## [271] 138  28  49  57 101  29  32  27 132  41  30  39  67  90 104  36  45 103
## [289]  55  54  75 109  56  35  27  28  40  53  67  82  71 144  98  52  61 105
## [307]  33  78  69  85  24   6  93  21  85  33  54  49 127 117  96  96 157  44
## [325]  43  48  39  33  37  46  62  54  70  59  36  45  36  26  38  17  47  94
## [343]  46 156  14  66  44  40  72  37  50 103  75  94  46  96  50  54  73  59
## [361] 110 102  84 124  55  33  37  73 115  38  57  90  42  92 288  78  39 101
## [379]  91  39   0  54  40  58 119  72  87  34  45  38 128  60  66  13  41  31
## [397]  34  24  30  44  54 211  40  24  77  33  15 138 107  53  43  36  40  31
## [415]  34  96  78  63 100 130  61  83 146 146  32 103  72  44  85  81  80  36
## [433]   0 100  34  93  42  92  35 104  55 141  43  58  67  41  14 130  33  56
## [451]  27  40  62  49  97 110  88

Reads of all the AML samples (mutated and no mutated) for the splice junction:

GeneSJ$chrX_45062749_45063421
##   [1]  39  38  85   3  49  40  32  38  13  63  45  29  84  86  34  41  39  57
##  [19] 100  14  80  64  40 108  93  39  67   1  83  53  26 109  15  32  42  54
##  [37]  41  55  31 186  36  30  46  33  37 109  15  63  61  19  72 212  39  22
##  [55] 136  24 116  57  55  79 120 191  29  61  47  46  19  50  29  49  41  53
##  [73]  81  50  27 105 265 115 103   7  42  20 126  60  76  18  46  23  62  64
##  [91] 112  72 222 171  76  60  35  35  19  66 155  26  40  30  26  61  58  18
## [109]  71  78  90  41  42  32  42  70  51  16  32  35  64  52  43  93  52 149
## [127] 110  19  53  47  63  68  21  65  54  23  74  88  63  57  32  50 225  38
## [145]  32 190  19  95 112  38  31 173 168  58  49  40  95  75  31  20  50 155
## [163]  22  61  38  33  62  54  20  44  67  88  22  70  52  69  41  74  43 137
## [181]  43  45  24  32  59  46  54  31  40 162  36  78 152  38  24 104  47  39
## [199]  50  25  36  48 103  71  23  35  49  41  75  53  35  54  54  43 122  50
## [217]  47  56  22  71 101  72  69 164  60  55  98  49  30  44  25 143 114  21
## [235]  30  56  54  77  15  57  66  49  37  28  65 185 112   0  28  42 153  99
## [253]  69  59  32  88  60  61  24  13  26  29  52  51  25  68  52  29  58  49
## [271] 110  27  38  29  80  25  27  25 127  31  25  31  58  74  84  21  42  62
## [289]  47  49  56  97  40  42  16  25  45  42  69  59  76 102  85  54  59  83
## [307]  32  68  78  67  25   2  66  13  96  26  46  44 106  98  87 101 103  39
## [325]  46  33  44  44  35  36  50  42  59  42  37  32  36  27  38  14  43  66
## [343]  34 117  18  40  33  30  35  38  47  72  77  79  45  71  35  52  52  25
## [361]  61  97  90  82  69  34  41  62  70  27  64 102  33  73 206  72  27  99
## [379]  65  39   0  38  28  46  96  68  65  32  36  39 120  63  43  14  56  40
## [397]  33  26  28  48  54 128  33  38  64  26  17 113  66  56  37  27  29  36
## [415]  34  89  50  35  60  74  45  73 141 121  34  81  84  26  61  63  57  34
## [433]   0 102  17  89  35  87  35  93  37 127  56  50  62  42   7 145  32  29
## [451]  42  37  46  48  55 112  55

23.2 Normalization

Count the reads of all the splice junctions of the gene harboring the variant:

GeneSJ$rowSum_SJtotal <- rowSums(GeneSJ[,grep("chr", names(GeneSJ))])

Normalization of the expression by the total read counts of all the splice junctions of the gene:

GeneSJ$Normalized_CanonDE <- (GeneSJ$chrX_45062749_45063421)/GeneSJ$rowSum_SJtotal*100
GeneSJ$Normalized_DG <- (GeneSJ$chrX_45062720_45063421)/GeneSJ$rowSum_SJtotal*100

Download the normalized values for the assessed splice junctions of all the AML samples:

23.3 VAF

Mutated samples vaf:

23.4 Plots

23.4.1 Static Dot Plots

Canonical splice junction:

Splicing alterations:

23.4.2 Interactive Dot Plots

Canonical splice junction:

Splicing alteration:

23.4.3 Violin Plots

Violin Plots for the alternative splice junctions interrogated:

Donor Loss:

Donor Gain:

23.5 Statistical Analysis

SJCounts <- GeneSJ

23.5.1 Donor Loss

23.5.1.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_CanonDE[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_CanonDE[SJCounts$GROUP == "WT"]
## W = 0.99066, p-value = 0.005593

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_CanonDE[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 2.683123

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_CanonDE[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 2.614379 3.139013

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_CanonDE - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] -0.06874374  0.45589063

23.5.1.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:448] = -2.6831, -2.2899, -2.0949,  ..., 1.9441, 2.1063
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.4461538 0.7208791
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_CanonDE")]
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- MUT_df$ECDF

MUT_df$Prediction <- "Donor Loss"
MUT_df$splice_junction_status <- "CanonicalSJ"
MUT_df$splice_junction_position <- "chrX:45062749-45063421"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

23.5.2 Donor Gain

23.5.2.1 Expression Difference

Normality Test:

shapiro.test(SJCounts$Normalized_DG[SJCounts$GROUP == "WT"])
## 
##  Shapiro-Wilk normality test
## 
## data:  SJCounts$Normalized_DG[SJCounts$GROUP == "WT"]
## W = 0.73658, p-value < 2.2e-16

Value of Mean Normalized Expression of the Alternative SJ in WT samples:

mean_WT_SJi <- mean(SJCounts$Normalized_DG[SJCounts$GROUP == "WT"], na.rm=TRUE)
mean_WT_SJi
## [1] 0.05049228

Normalized Expression Value of the Alternative SJ in the MUT sample:

MUT_SJi <- SJCounts$Normalized_DG[SJCounts$GROUP == "MUT"]
MUT_SJi
## [1] 0.1307190 0.1494768

Deviation from the mean normalized expression:

SJCounts$Difference <-  SJCounts$Normalized_DG - mean_WT_SJi

Difference in the MUT sample: deviation of the Normalized expression of the MUT patient from the mean normalized WT expression

SJCounts$Difference[SJCounts$GROUP == "MUT"]
## [1] 0.08022668 0.09898455

23.5.2.2 ECDF - Pvalue Inference

v_ecdf <- ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
v_ecdf
## Empirical CDF 
## Call: ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"])
##  x[1:217] = -0.050492, -0.034438, -0.032458,  ..., 0.3371, 0.35437
plot(ecdf(SJCounts$Difference[SJCounts$GROUP == "WT"]))

v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
## [1] 0.8791209 0.9142857
MUT_df <- SJCounts[SJCounts$GROUP == "MUT",c("sample_id","case_id", "Normalized_DG")] 
colnames(MUT_df) <- c("sample_id", "case_id", "NormalizedExpression")

MUT_df$ECDF <- v_ecdf(SJCounts$Difference[SJCounts$GROUP == "MUT"])
MUT_df$Pvalue <- 1 - MUT_df$ECDF

MUT_df$Prediction <- "Donor Gain"
MUT_df$splice_junction_status <- "AlternativeSJ found in MUT samples"
MUT_df$splice_junction_position <- "chrX:45062720-45063421"

MUT_df$ECDF <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$ECDF)
MUT_df$Pvalue <- ifelse(MUT_df$NormalizedExpression == 0, NA,MUT_df$Pvalue)

Download the vaf, inferred percentiles and pvalues of the mutated samples:

24 Results Summary

Download the vaf, inferred percentiles and pvalues of all the splicing alterations evaluated: